Code. Never. Lies.

We had this fun exchange on email that I wanted to share with you.

Bbales wrote:

A docs question came up earlier this week on the forums,

Bgoodrich’s response was “The code is pretty short but the documentation manages to not correspond to it.”

which I [Bbales] thought was pretty funny.

To which Bob replied:

That is the main problem with documentation—it can lie.

17 thoughts on “Code. Never. Lies.

  1. My own experience with this is that even though I try to be diligent about documenting my code, over the lifetime of a program as the code is modified to add features or fix prior errors, the documentation often fails to keep up with all the changes.

    Ultimately, I think that the only reliable documentation of code is the code itself. That’s why it’s important to code transparently. Maintaining correct documentation is a difficult task over the life of a program and I, at least, find it nearly impossible to do over the long run.

    • > over the life of a program

      Speaking of documents changing over time, I was reading about python match statements recently. Python features get documented in these things called PEP (“Python Enhancement Proposals”).

      Compare the early PEP for match statements: https://peps.python.org/pep-0622/

      To the one that supersedes it: https://peps.python.org/pep-0634/

      It is really interesting how the first is full of tons of examples to explain the goal but the second is very technical. For example a small snippet from the first is:

      > For example, matching the pattern Point2d(x, 0) to the subject Point2d(3, 0) successfully matches. The match also binds the pattern’s free variable x to the subject’s value 3.

      > As another example, if the subject is [3, 0], the match fails because the subject’s type list is not the pattern’s Point2d.

      The second document is much more terse — it only uses the word example 3 times (according to my web browser’s search). By comparison the first uses example 59 times! I don’t know if this is a common pattern in Python language development or not.

    • Well, I think this is a bit like Box’s now cliche observation about models. Except for small toy programs, all code is wrong, but some code is useful.

      • Box’s cliche is a cliche in the sciences but not understood in the
        humanities. I have to deal with criticisms from social scientists
        and some of them really do think that if a model does not reflect
        the whole truth it is therefore worthless. It makes dialog almost
        impossible.

        • I use a joke:

          Picasso paints a portrait of a women and her husband complains when he meets Picasso.

          Your painting does not look anything like my wife!

          Really what does she look like?

          Here – I have a picture of her in my wallet – look!

          Picasso comments – my she is awfully tiny!

          Anyway, always gets a laugh – but it’s the same issue for any useful model (representation).

        • ” I have to deal with criticisms from social scientists
          and some of them really do think that if a model does not reflect
          the whole truth it is therefore worthless. It makes dialog almost
          impossible.”

          Yes, this is an instance of a wider problem: an aversion to uncertainty. It often occurs in medicine as well as in the social sciences. So when I teach statistics, I often repeat the maxim, “If it involves statistical inference, then it involves uncertainty”, and elaborate with “If everything is certain, there is no need for statistical inference”.

      • > now cliche observation about models
        It would be a interesting question as to when it became cliche for most statisticians (2010?)

        As for code, I would argue it is an abstraction, that could be understood as just an abstraction purely deductively (at least at the machine instruction level). So it can’t be be wrong. It is just what it is, but it can be misunderstood. So it’s understandings of code (documentation) that are always wrong.

        • Yes, code is an abstraction. But an abstraction of what? It is an abstraction of some specific function that maps a (possibly empty) set of inputs into outputs, thereby computing a function from a defined domain into a defined range. Code can be wrong in that it computes some function other than the intended function.

          There is also another way code can be wrong. Ideally, when given an input that is not in the domain, the code should refrain from producing an output in the range. It should instead terminate with no output in the range set,* or, less ideally, fail to terminate. Code that accepts a non-domain input and produces an output in the function’s range is also incorrect, though in a way that might be regarded as less serious in many settings.

          *In real world implementations, when confronted with an invalid input, programs might output a value that is recognizably not in the range of the intended function, or abort execution, or create an “exception” condition. In any event, it signals that it has been given an invalid input in some recognizable way. Failure to do this is another way code can be wrong.

        • > I would argue it is an abstraction

          Oh oh oh! And I would like to add that maybe it is a different sort of abstraction in different contexts!

          For instance, type hints in Python help IDEs quite a lot but aren’t enforced at runtime!

          In the other direction, the OCAML people talk about how OCAML can help you avoid bugs by requiring all possible code paths to be defined (Rust has aspects like this as well!)

  2. Code. Never. Lies.??

    Well, I guess I better reread Reflections on Trusting Trust.

    I seem to recall many examples in Numerical Analysis in which what the code said was not what the computer calculated.

    How about “Code lies less frequently than comments lie.”

    Bob76

  3. Question 1: What is the computer doing
    Answer: Carrying out the instructions in this code.

    If that is the question, then the answer is correct, and not a lie.

    However–

    Question 2: what should the computer be doing
    Answer: if I knew precisely the answer to that question, then the answer would be some bug free code which the computer could carry out.

    When the question is question 1 then the code doesn’t lie… When the question is question 2 then there is no known correct answer for all but the simplest tasks.

Leave a Reply

Your email address will not be published. Required fields are marked *