How to think about proofs of correctness of computer programs?

Mark Tuttle points to this post by Bill Gasarch about a panel discussion on the following topic:

Have the points made in Social Processes and Proofs of Theorems and Programs by DeMillo, Lipton, Perlis, survived the test of time? (Spoiler Alert: Yes.) . . .

1) The main topic was the 1979 DeMillo-Lipton-Perlis paper (see here) that gave arguments why Proofs of Program correctness could not work.

An all-to-brief summary of the DLP paper: Some researchers are trying to set up frameworks for doing proofs that programs are correct, analogous to the certainty we get with a proof of a Theorem in Mathematics. But proofs in Mathematics are, in reality, NOT that rigorous. Often details are left out or left to the reader. This is fine for mathematics (more on that later) but unacceptable for programs which need rather precise and rigorous proofs.

How do theorems in mathematics really get verified? By having enough people look at them and make sure they match intuitions, what DLP call A Social Process. (NOTE FROM BILL: Papers that are not important do not get looked at so there may well be errors.)

2) The notion of proving-programs-correct was very seductive; however, the people who were trying to do this had a blind spot about how the analogy of proving-programs-correct and proving-theorem-correct differ. In particular, a program is rather complicated and even stating carefully what you want to prove is difficult. By contrast, for most math statements, what you want to prove is clear. Note also that a program has lots of code (far more now than when DLP was written) and so much can happen that you cannot account for. . . .

6) So how can we be more certain that programs are correct?

a) Testing.
b) Modularize and test. Fix errors. Modularize and test. Fix errors…
c) Try to isolate side effects.
d) More testing.

Some point to Model Checking, which could be considered very sophisticated testing, but that’s used to verify circuits and perhaps low-level code, not programs. . . .

7) Can computers themselves help with proofs of correctness? That is the only hope; however, there are scaling problems.

8) When DLP was written a program with 100,000 lines of code was considered large. Now we have programs with millions of lines of code. And now we have more concurrency. So the lessons of the DLP paper are probably more relevant now then they were then.

9) Since Program Verification does not seem to be used, how come we don’t have a Software crisis?

a) We do! The Q+A mechanism at the meeting was terrible.
b) We do! FILL IN YOUR OWN FAVORITE STORY OF BAD SOFTWARE.
c) See the answer to question 6. . . .

The comments to that post are interesting too.

Tuttle then writes to me:

The point, from your several relevant blog discussions, is the criticality of “social process” – in this case social process in programming and software engineering, and, yes, even in mathematics. I’m not sure why there seem to be so many failures of social process in the kind of things you review – results that should be retracted, but, aren’t, for instance. Mathematicians seem to be able to fix things, given enough time, and, of course, the market fixes software engineering shortfalls. So, per your observations, where is the market for the kind of problematic conclusions you discuss?

Has there not been enough time? Or, is it that the results are not important enough for the market to care?

Disclosure: I’ve never proved a program correct. However, once I studied program verification it affected how I thought about programming and documentation. For example, if I’m implementing a function that is really important (typically short, but central), I may add a comment at the beginning describing a “pre-condition” – what is true about the inputs. Then, I try to name the function in a way that represents the post-condition – what is true about the output. The point, of course, is that the innards of the function represent the transformation of the inputs into the outputs, without the mathematics required for real program verification. In this way, the learned papers about program verification did effect how I did, and do, software engineering.

I’m reminded of Bob Carpenter’s remark that machine learning research has huge replicability problems, first because methods are tuned to the particular examples in the published papers, so advertised methods will typically work much better in those examples than in the wild, and second because machine learning methods will depend on lots of tuning parameters and settings that are not specified anywhere in the papers that describe them. The field of statistics is even worse, in that we (statisticians) often don’t supply any code at all. Talk about non-replicable!

And all of this is getting worse with the proliferation of short-form articles such as 6-page PNAS articles and 8-page conference papers where the focus is on importance of the method, not on correctness, and where there’s just no space to describe exactly what you’ve done.

Meanwhile the theoretical papers are all yammering on about “guarantees,” which to me is just a fancy way of saying “assumptions.”

13 thoughts on “How to think about proofs of correctness of computer programs?

  1. One problem with formal verification is that the code that is supposed to check the program for correctness will – you can bet on it! – have its own bugs.

    Another problem is to know with sufficient detail exactly what behavior you need to verify. Essentially, that means showing that a program meets its specifications. But software specifications are often, probably usually, imprecise, incomplete, obsolete, wrong, or all of the above.

    In any event, formal verification is essentially a computation, and as such is subject to the halting problem.

    So there can be some software that is amenable to formal verification, but I’m not holding my breath until it becomes generally feasible.

  2. While it is certainly true that Mathematics and Programming are social disciplines one shouldn’t ignore the mental discipline that some of the proof systems give you. A really good case is given here: https://xenaproject.wordpress.com/2021/06/05/half-a-year-of-the-liquid-tensor-experiment-amazing-developments/

    In the course of rewriting the proof so that it could be digested by machine they found a clearer and better way of dealing with a difficult section.

    • Victor:

      Interesting. This reminds me of the way in which being forced to program an algorithm can reveal gaps in a mathematical argument. And sometimes it goes the other way: I’ll have a math idea that seems incomplete, but once I program it up the math becomes clear.

  3. I do not know so much about formal program verification. But formal proof verification in mathematics seems to have made some progress. Two years ago a challenge was made by Peter Scholze (Fields medalist) to verify a theorem he described as “my most important yet”. I think the fact that a mathematicien of his standing asked for verification and that it could be delivered in quite a short time made a big impression in the mathematical community.

    Find more in Peter Scholze’s summary of the state of the challenge:
    https://xenaproject.wordpress.com/2021/06/05/half-a-year-of-the-liquid-tensor-experiment-amazing-developments/

    His original blogpost with the challenge. Unless you are REALLY into arithmetic geometry most of it will be complete gibberish. But under 6. he explains “Why do I want a formalization?”
    https://xenaproject.wordpress.com/2020/12/05/liquid-tensor-experiment/

  4. (This is Bill Gasarch) Thanks for noticing my post. A few points that are INSPIRED by your post on my post, as it gives me an excuse to (a) rethink some things, and (b) put off working on indexing my book
    https://hardness.mit.edu/
    which is a real drag.

    1) The DLP paper caused funding for program verification to be cut (I have heard this- it needs to be verified- perhaps by an NSF program verifier :-) ). So it may be that DLP’s criticisms of Prog Verif became a self-fulfilling prophesy.

    2) Does the success of having programs verify some math theorems (mentioned in comment on both your post and my post) encourage us to think that PROGRAM verification may become a reality?

    3) I made an too-short comment that verifying large programs has a problem of `scaling’. While true, there may be ways around it. One is tempted to say USE ML or USE AI which is plausible though AI/ML may also be overhyped. I’ve also heard USE QUANTUM COMPUTING which I first thought was satirical, but NO- some people actually said that. People who know nothing about Prog Verif or Quantum computing.

    4) How well does Social Processing work? When I wrote my post my thought was that it works well on important theorems that people check carefully but not on unimportant ones. Is that a problem? YES- (a) one never knows when an unimportant paper becomes a building block for an important one, (b) do people really check things that are published. This may be more of a problem in CS than in Math since in CS we have prestigious conferences (STOC, FOCS, others) and often important papers are in conferences (not refereed that carefully) and maybe on arxiv (not refereed at all) but not in journals.

    • That DLP paper was interesting. It brings to mind:

      * How handy the Python type system is even though the running code can ignore it
      * How handy modern IDEs are, especially with languages constructed with editors in mind
      * The utility of internet package + build systems (especially when they are tightly coupled to the language)

  5. Mark makes it look like formal program verification is a futile endeavor but that’s not completely true! Projects such as CompCert (a C compiler) or seL4 (an L4 kernel implementation) show that is possible to write formal correctness proofs even for rather complex software.

    On the other hand, it might be difficult to proof even very short snippets of code that could be found everywhere in the wild. Forget about 100klocs, even 5k lines might be a challenge as the length and complexity of the proof terms for typical code often grows exponentially.

    In general, it is absolutely impractical to proof code afterwards. Projects like CompCert go the other way around. They start from a specification in a total language like coq (or agda or…there’s a zoo full of them), then after some intermediate rewriting steps they extract it to a practical language like OCaml or even C. Still difficult, but possible.

    Personally, I would love to live in a world where software gets formally verified before shipping. Yes, I still have dreams. Even if this will never come true, Mark makes some good practical suggestions of what to do instead. It sure would be great if software was tested before it was shipped and it would be great if bugs were actually getting fixed. This is, however, very rare in a commercial setting. Big software projects often have tens of thousands of unresolved bugs (not kidding!) in their bug tracker (at least they have a bug tracker! This is already pretty good. Some projects even have public a bug tracker like Mozilla. This is even better).

    One last thing. Mark mentions automatic theorem proofing in 7. There’s a lot of ongoing research regarding sledgehammer tactics and they sure can make life a lot easier. However, they aren’t our “hope”. For those tactics to succed the problem has to be sufficiently restricted or dumb or you have to be lucky. And unfortunately even simple theorems will often be ridiculously difficult to proof if they have a proofs at all. Just think of number theory.

  6. Not sure how relevant this is, but we have intuition for many mathematical theorems, and often, even if there are errors in a proof, the theorem still holds and the proof can be repaired based on the intuition. Otherwise, sometimes the argument against a given proof makes it clear that the theorem can’t hold, or needs more assumptions. I think this is currently outside the reach of computer checking of proofs.

    I don’t see any credible intuition though that software of more than a very small size should be correct. The only issue where this may apply is the question whether a certain approach is valid for solving a certain problem, however I don’t see where any intuition could come from that it’s coded correctly.

  7. As usual, the discussion of this topic starts from the wrong premise. (Although Henning above is onto the problem, so there are two of us at least.)

    Trying to verify a program as correct is futile, because any mildly interesting program is not correct.

    Likewise, we do not have a social method to verify mathematical proofs. We have a social method to offer and invalidate proofs, one with the key virtue of repairing minor problems with some efficiency. Under what conditions might this method it work? Writing of a certain caliber about interesting results (the more interesting the result, the less important the writing!) based on well-known background material.

Leave a Reply

Your email address will not be published. Required fields are marked *