Proofs Cant Prove The Absence Of Bugs

Perennially, we have a little discussion to the effect that TestsCantProveTheAbsenceOfBugs. This is a hoary observation probably made by Lady Lovelace, or maybe Grace Hopper, who claimed to have invented the bug. It's true on the face of it: no amount of [blind ] testing can be certain to have exercised all combinations of inputs so as to ensure that the program actually works.

GraceHopper didn't invent the bug. She didn't even claim to. Check FOLDOC for the story.

The oft-implied idea is that some kind of mathematical proof could in fact "prove" the absence of bugs. The suggestion is that sufficient formal reasoning about the program can determine once and for all that the program is without flaw.

While this may be true in principle, it isn't even somewhat true in practice. The amount of formal reasoning required to prove even the smallest programs is beyond the capacity of most of us. The amount required to prove a payroll, air traffic control system, or telephone switch is literally beyond human comprehension.

Even for these simple programs, the correctness of the processor would also have to be proved for this to be any use. If an operating system or compiler were used the correctness of these would also have to be verified. -- ChrisBrooking?

Chris, I agree that "just because the source code has been proven correct doesn't guarantee that it will work when installed on real hardware using a real compiler". But the casual reader may draw the incorrect conclusion that proving the correctness of the source code is always a waste of time. Let's draw an analogy between "source code" and "bridge blueprints". Even if a (civil) engineer took the blueprints and calculated that the bridge ought to be able to hold up a fully-loaded train, the bridge will fail if the materials used to build the bridge are defective, or if they were incorrectly assembled, or if the engineer made enough mistakes in the calculation. Is it a waste of time for the engineer to make those calculations? -- DavidCary

Further, human reasoning is quite flawed. It should be obvious to the casual observer that our specialty is not logic. Even the most carefully-crafted mathematical proofs are often erroneous. The typical graduate mathematics homework assignment results in a wide variance of proof quality and correctness. (Which is why we have machines to do the logical heavy lifting for us....)

The issue with regard to bugs is confidence. Defect-finding mechanisms must be applied until human confidence reaches the level appropriate to the impact of an unfound bug. If it's a word processor, you might be less concerned about bugs than if it's a heart-lung machine.

Since confidence is a human feeling, the proof mechanism must be accessible to the person who needs confidence. It may or may not engender confidence in the Vice President to have a wild-eyed mathematician (think Christopher Lloyd as Doc in Back to the Future) assure him "I've proved conclusively that the program will work."

Even if it were true that a formal proof can engender more confidence than conscientious testing, which this author does not believe, the proof approach founders when the program undergoes maintenance. The proof will generally not be revisited under maintenance. When it is revisited, it will generally not be regenerated.

A suite of UnitTests and AcceptanceTests, on the other hand, survives and can be run, daily even, checking and rechecking each change.

Do the math: Nothing can truly prove the absence of bugs. The usual approach of big-bang testing under a time limit and pressure to release is clearly bad. A comprehensive repeatable test suite is much much better. If you're not doing PairProgramming (and perhaps even if you are), consider doing intensive code reviews (and translating the results to tests). If you're dealing with life and death, consider proving some key algorithms. But in the real world, YouArentGonnaNeedIt. -- RonJeffries

Nothing can prove the absence of all bugs, true; issues of computability (decidability in particular) make that quite impossible even before you deal with regression (e.g. possible logic errors in the specification of what constitutes a bug). However, it's certainly possible to prove the absence many common classes of bugs. If your programming environment provides the tools to readily do so (i.e. automatically), YouMightAsWellDoIt?. Consider passing the proof one more AcceptanceTest.


Are there any wikizens actually using program proof in their day-to-day work? Please tell us about it.

You mean other than a typechecker? or static assertions? or design-by-contract? or the more inductive unit-test based proofs? or Lint? or the less formal careful examination of code, comparing it against the comments to ensure it's doing what it says it does? I think a lot of people use proof of one form or another, if only the automatic ones.

Are there other techniques that you actually use? Please report on those as well.
In answer to the above questions, I and my colleagues routinely use program proof. We don't write directly in C++ or Java, we write in a notation that allows specifications to be expressed and (where necessary) refined into something close to code; then the tool we use generates the C++ or Java program. The proofs are generated using AutomatedReasoning?. The tool and approach is described at http://www.eschertech.com. The biggest project we have done so far generates 130,000 proofs and 300,000 lines of C++. The biggest project for which we obtained 100% successful automated proof generates 40,000 lines of Java. The proofs show that the requirements design meets the requirements and that the implementation behaves in accordance with the design; which in turn means that the implementation meets the requirements and the requirements are consistent. To the explicit requirements are added all the usual implied requirements (no run-time exceptions or buffer overlows, etc.).

As to whether you can prove a program is bug free, that depends on your viewpoint. If a civil engineer can use maths to prove that bridge design is fit for purpose, then I am using maths to prove that programs are bug-free. What I really mean by "prove" is that there is a very high probability that the program will meet the specified requirements. I say a "very high probability" because the hardware could be faulty, or the compiler could be faulty, or the prover could be faulty etc. In practice, any bugs remaining in the software are almost always due to faulty (or incompletely specified) requirements. You can prove the requirements are consistent, but you cannot prove that they correspond to the user's needs.

Maintenance of a program produced in this way is no problem as long as you edit the original specification (and any manual refinement), then generate code again. If you edit the code by hand, all bets are off. If the program uses a legacy (i.e. handwritten) component, you can only prove the program is correct on the assumption that the imported component behaves according to the specifications you gave for it.

Proving handwritten C++ or Java correct is much, much harder and is only possible for well-structured programs. TonyHoare has proposed this topic as a GrandChallenge?.
A static type-checker is a poor man's program prover. I recently did a big refactoring in a C++ program, and the type system gave me a lot of confidence that the change was correct. The type-checker is run automatically; the type system necessarily keep in sync with the program.

I'm not arguing against testing, just saying that proofs can have a role. -- DaveHarris
See ProofOfCorrectness for a couple of additional comments.
Further, human reasoning is quite flawed. It should be obvious to the casual observer that our specialty is not logic. Even the most carefully-crafted mathematical proofs are often erroneous. The typical graduate mathematics homework assignment results in a wide variance of proof quality and correctness.

Actually, there is a way around the unreliability. If you define a formal language for proofs you can check mechanically whether something is really a proof. You still have to come up with the proof of course. In a way, this is similar to automated UnitTests. You do have to write them, but once you have them you can check them automatically.

(Actually, it would take an enormous amount of work to write something like ZermeloFraenkelUnit? even for Smalltalk, but it can be done. And I guess that someday it will be done.)

The drawback of UnitTests is that they cannot be absolutely complete. Formal specifications can be absolutely complete. Of course the drawback of writing proofs is that it takes a lot more time than writing code and that it interferes with the rapid feedback loops of XP. But if we ever develop AI that is smart enough to do most of the work, it could become much more practical.

I said that formal specifications can be absolutely complete. That's not really true, because in the end you cannot definitively prove anything about the real world. But if you give a precise syntax and semantics of your programming language and your proof language, you can prove that a program conforms to a specification.

But, as Ron says, for most applications YouArentGonnaNeedIt. And I think that even for applications that need to be ultra-reliable and might benefit from proofs, you wouldn't start with a formal specification. As long as our AI isn't smart enough to help us come up with a proof-code combination from a specification more quickly then we can find them with programming and UnitTests, UnitTests would still be the tool that we use to come up with a nearly correct program. Once you have a nearly correct program you can use the formal methods to fix the few remaining bugs.

Actually, this is a bit of a hobby horse of mine. Synthesis isn't analysis. Writing programs and writing proofs requires creativity, trial and error, hand waving and sometimes thinking the illogical. Analysis proceeds deductively, synthesis proceeds inductively. You don't start with a neat Z-specification and then arrive at a hundred thousand lines of C++ code by stepwise refinement. You arrive at those one hundred thousand lines of C++ (or ten thousand lines of Smalltalk ;-)) by ExtremeProgramming. Afterwards you can use formal methods to come up with a neat stepwise refinement. Stepwise refinement is great for checking stuff. It's useless for finding stuff.

I'll get off my soap box now ;-)

-- MartijnMeijering
Please correct me if I'm wrong, but as I understand it the end of a formal proof of correctness is really a statement that two expressions, the specification and the program, are equivalent. But there is no guarantee that the specification is what it SHOULD be. There is still an element of human judgment involved, even with automated theorem proving. -- KentBeck

You're absolutely right. We can't prove more than conformance to specification. And even that is an idealization, because ultimately a system runs on real hardware in the real world and we mere mortals can't definitively prove things about the real world. And of course, as you say, we have no guarantee that the specification is what it should be. We can ask bright minds to review it, we can test it, but we cannot prove it's right. So we must not delude ourselves about the possibilities of formal proofs.

We also mustn't delude ourselves about the need for proofs. A structural engineer can't prove mathematically that his bridge or skyscraper or whatever won't fall down. But they don't fall down terribly often, unless people start dropping bombs on them. Boeing can't prove the "correctness" of their planes. And yet we consider them safe enough to risk our lives flying in them.

They do do a lot of formal checking (proofs in an axiomatic system). It's just that their axioms are pretty high level statements about the properties of the materials. You don't need (and can't have) proofs all the way down. But that's a long way from saying we don't need proofs.

Also, right now formal methods aren't practical. We just don't have the technical and mathematical framework for doing it. You have to give a mathematically precise definition of both the syntax and semantics of your programming language and your proof language. This definition is going to be rather long, so you'll probably have to define at least one level of metalanguage first. You also have to come up with a sound set of inference rules. (And what do you take? Classical logic, intuitionistic logic?, some other "non-standard" logic?).

There *are* now a number of practical formal methods. Several companies are using formal methods for software development - DavidCrocker.

Several languages have such a mathematical definition, including Occam, Pascal, and ML. ML's formal definition is short. ML can be used for the program, and the proof language, and there are proof verifiers for ML written in ML. There are formally proven programs in ML. So I claim it's practical. Just not always practical, which is a very different thing.

Then, in order for this to be practical, you need a program that verifies proofs. Of course, that program needs to be correct. You could try to prove it correct with itself. But that doesn't prove anything. That's more like a UnitTest. So, you'd check it by hand as well. Or you'd build more than one implementation and let them check each other. You could ask the brightest mathematicians in the world to try and find mistakes in the proofs. You could offer a million dollars for someone who finds a mistake.

Also, you need to prove that the compiler that compiles your proof-program conforms to its specification. You need to prove that the processor you run the program on conforms to its specification. And that is about as far as you can go.

Obviously, this is an enormous amount of work, but I think it is doable. And I think that someday someone will do it.

Independent verification proofs is nice to have, just like verification that the compiler has correctly translated the program to object code. Neither is required unless the software being developed is extremely critical. Just as we trust compilers that we have found to be reliable, so we trust proof tools that we have found to be reliable. This may fall short of absolute proof of correctness, but it sure beats the flawed concept of "correctness by testing". -- DavidCrocker.

But despite these limitations, I still find the idea appealing. If someone ever builds ZermeloFraenkelUnit?, I think it could be very useful. Suppose we could cost-effectively (for a small number of critical applications) prove that a program conforms to its specification. Wouldn't that be a tremendous achievement? Wouldn't that give you a lot of confidence in the correctness of a program? Once it becomes doable, wouldn't we want the control software for a nuclear plant to be "proved" correct? Software has the potential of being the most reliable technology in the history of the world. -- Martijn

Actually, there are a number of systems out there that have been proved correct, e.g. the control software for driverless trains on the Paris Metro -- DavidCrocker.

Interesting that you point out the need to prove the automated prover and the compiler and the processor. UnitTests provide verification for all aspects of the system they run on.

It should also be noted that there is less than complete agreement on what constitutes a mathematical proof. Will ZermeloFraenkelUnit? satisfy Constructivists? -- KielHodges

How about Finitists? ZF has the AxiomOfInfinity?. Any proof about the behaviour of a program running on a finite computer ought to be finitistic surely? (This may not be the place for this, but the HaltingProblem for a deterministic program running on a finite computer *can* be solved on a computer with just slightly more than twice the storage – though it may take exponential time.)

{That depends, is there an upper bound on size of external input? If there is, then there are algorithms that solve it. If not, then there aren't.}
To quote DonaldKnuth, "Beware of bugs in the above code; I have only proved it correct, not tried it." (http://www-cs-faculty.stanford.edu/~knuth/faq.html)


Are there any wikizens actually using program proof in their day-to-day work? Please tell us about it.

Of course the drawback of writing proofs is that it takes a lot more time than writing code and that it interferes with the rapid feedback loops of XP

Well, I'm not doing every-day proofs right now, but we are definitely looking at doing so in the very near future, because its quicker, easier and more iterative than writing tedious UnitTests. We already use formal proofs as a part of our final release procedure.

OK, I'll come clean. I'm not talking about software. My field is microprocessor core design (TriCore). The formal methods we use for releases ("tape out") are of 2 types: formal equivalence and static timing analysis. (You could argue that STA is not a formal proof: simply a static verification technique).

We do a formal equivalence check between a highish level (RTL) description and its gate level equivalent. This is possible because the RTL is written using a "synthesizable subset" of hardware description languages. It's simply a matter of comparing the combinatorial logic between sequential elements, which is well within the scope of today's tools.

''In software, that would be like proving that the high-level language and its assembly or machine code translations are equivalent. Clearly doable, but not so clearly useful. If you can indeed prove that your RTL and its gate level equivalent are really equivalent, why don't you have a compiler to do the task?''

Modern tools are now going beyond equivalence checking into the realm of "property checking". Their notations are migrating from obscure mathematics towards user friendly languages that are readable to average programmers (This means a Verilog-like syntax, which is similar to C).

One of the problems with verification of complex hardware designs is the cost of UnitTests for entities whose interfaces are not fixed in stone. Verification requires testbenches to attach stimuli to a design, test patterns to stimulate it, and oracles to determine the expected outputs. Where XP assumes that the cost of migrating tests during refactoring is small, the cost of doing so in hardware is prohibitive.

Enter property checking. "Properties" are simple expressions that define a true statement about a design. For example, "the output will be asserted if, on previous clock cycles, the input sequence was 1,2,3,4" (this assertion would, of course, be expressed in an appropriate language). The hardware designer (or pair of designers) can write this assertion in seconds, and it can be automatically checked repeatedly by tools. Even more important, the assertion can easily be strengthened to "the output will *only* be asserted if ..." - this is not more expensive to check using formal proofs, but the cost of checking it through dynamic simulation is very high (could take hours, days, centuries of CPU time).

In summary, where XP uses UnitTests to enable iterative development, hardware design is moving towards formal proofs. They are much cheaper and more flexible than UnitTests.

-- DaveWhipp


What about BugsCantTestTheAbsenceOfProofsOrTests?? I would say the latter is a much much more useful idea than all of the above.

-- BrutusHalliwell?

P.S. -- Is there a page on CubicleParanoia?

I think any useful business software satisfies some customer's need. And with some of her needs satisfied, she is likely to come up with other. At this point the software does no longer meet the requirements. You could call that a bug. And no process I can think of will get rid of that one. There is no such thing as absence of bugs. -- Wolfgang Berlow


Absolute proof may be out of reach. But in some domains, proofs are tractable, and can increase the reliability of the resultant software. All the stuff on the page about how it can't be done in practice ignore some examples of it being done successfully. One (well documented in papers I can't get at right now :-/) example is the floating point implementation for the Transputer microprocessor, formally derived from the IEEE floating point specifications. It completed in less time than the parallel effort using more traditional techniques, with no known bugs (a bug occurred later, when a change was made but the formal derivation was skipped "to save time").

This is how to make formal proofs (sometimes) tractable: the effort of proving an arbitrary program meets a specification is often completely intractable. Deriving from the specification a program in a formally correct manner is a lot easier (if not often easy enough to be practicable).

And there are formally verified compilers for formally specified (and practicably useful) languages (one of my previous coworkers worked on one for the CHILL language).

-- PaulHudson


We don't produce formally verified proofs, but we do develop informal proofs along with code. This does two things. Firstly, people who then read the code and follow the proof are convinced it works, but secondly, and more importantly, it stops us from writing code that's too complicated. If the code is simple the proof is easy, if the code is complicated then the proof is really, really useful.

People here seem to have a strange idea about what mathematicians call proofs. The simplest proof that the square root of 2 is irrational is to say:
Suppose 2=(a/b)^2, so that 2*b^2 = a^2. Now write out the prime factorizations of a and b and note that 2 appears an odd number of times on the left, but an even number of times on the right. Hence we have a contradiction.

People here seem not to think that's too rigorous, but it's complete, correct and convincing. Writing the same kind of reasoning along with code can be immensely useful.

-- AnonymousOnPurpose

Actually, that proof relies on the uniqueness of prime factorization of integers, which is a far from trivial fact. A better course is to suppose that the fraction is already simplified (i.e., a and b have no common factor) and note that a=2k since a^2=2*b^2 is even. One can then see that b=2*k^2 must also be even, which implies a/b can be simplified, a contradiction.

How about proofs can prove the absence of bugs within the domain covered by the proof. Even in mathematical proofs, you need to assume the domain you are considering. Proof of a theory in Euclidian geometry says nothing about the theory in non-Euclidian geometry.


The oft-implied idea is that some kind of mathematical proof could in fact "prove" the absence of bugs. The suggestion is that sufficient formal reasoning about the program can determine once and for all that the program is without flaw.

I don't know anybody who still talks about proving programs "correct" except in the most informal sense, rather one proves that a program has certain properties w.r.t. a specific model and logic. Correctness is in the eye of the beholder.

While this may be true in principle, it isn't even somewhat true in practice. The amount of formal reasoning required to prove even the smallest programs is beyond the capacity of most of us. The amount required to prove a payroll, air traffic control system, or telephone switch is literally beyond human comprehension.

Even for these simple programs, the correctness of the processor would also have to be proved for this to be any use.

I don't have much experience with payroll, air traffic control, or telephone systems, but my impression is that they are shallow and broad. Proving things about such systems is hard mainly because formally specifying them is hard and takes a lot of time. Whatever properties they may possess are likely to require quite complicated expressions to capture.

You don't have to prove the OS or compiler to be correct to gain value from a formal proof. It would be nice of course, but my opinion is that most of the value is in the careful analysis and reasoning about the program itself. In most cases it isn't the final property proven.

Also hardware verification is one area where formal methods have been used quite extensively and successfully. I'm pretty sure Intel uses ModelChecking?, I know AMD has modeled the Athlon fpu using ACL2, and a sequentially consistent version of the JVM has been modeled in ACL2.

Further, human reasoning is quite flawed. It should be obvious to the casual observer that our specialty is not logic. Even the most carefully-crafted mathematical proofs are often erroneous. The typical graduate mathematics homework assignment results in a wide variance of proof quality and correctness.

This is an issue with hand written proofs, but less if one is using an automated theorem proving system. Then the issues become checking that the definitions and proof obligations are stated properly, and the logic can be mechanically checked (by a system admittedly succeptable to bugs).

Since confidence is a human feeling, the proof mechanism must be accessible to the person who needs confidence. It may or may not engender confidence in the Vice President to have a wild-eyed mathematician (think Christopher Lloyd as Doc in Back to the Future) assure him "I've proved conclusively that the program will work."

Somebody who states they've proven a program "correct" is being careless and obviously their claims should be taken in that context, but I think that properly written theorems are more accessible than scanning through a listing of unit tests.

A suite of UnitTests and AcceptanceTests, on the other hand, survives and can be run, daily even, checking and rechecking each change.

The same thing can be true of well written automated proof scripts. They can be reexecuted in the wake of additional changes (possibly fixing whatever proofs may fail. ACL2 supports this. I assume PVS does as well. (Of course, proofs may need to be redone just as tests may change as the system evolves).

Do the math: Nothing can truly prove the absence of bugs. The usual approach of big-bang testing under a time limit and pressure to release is clearly bad. A comprehensive repeatable test suite is much much better. If you're not doing PairProgramming (and perhaps even if you are), consider doing intensive code reviews (and translating the results to tests). If you're dealing with life and death, consider proving some key algorithms. But in the real world, YouArentGonnaNeedIt. -- RonJeffries

For 99.9% of all projects aren't going to need proofs and even if you do, you want a rigourously tested system even before starting the endeavor. That said though, formal proofs can tell you things that no amount of testing can ever do. -- JoeHendrix


Proofs Can't Prove What They Don't Cover

Of course a proof does not show the absence of bugs, except for the area it covers. A proof shows correctness in what it covers, it does not say anything about the areas it does not cover. What is unknown is unknown.


Those skeptical of formal proofs may enjoy a new perspective that EwDijkstra's book A Discipline of Programming gives. I find his ideas very different from what I expected, and you can usually find cheap second-hand copies of this book via http://www.bookfinder.com/. My understanding is that he approaches program correctness from the same perspective that computer scientists approach algorithm correctness, but expresses his proofs in a formal programming language so that they are, in fact, programs. Remember that correctness proofs of algorithms are actually very practical, routinely used by computer scientists, and we all use them indirectly any time we call 'sort()' or run a compiler or ...! I doubt I could go into more detail and still do justice to the ideas - try the book if you're interested!

I also think it's unfortunate that formal proofs vs testing is becoming a rivalry, when, in fact, they seem complementary in many ways: I'd much rather test a program that implements an algorithm that I know is correct (by reading a clear and understandable proof), than to write a program that "seems like it should work" and rely solely on tests. I don't think this opinion should be controversial, even if the importance of proofs varies a lot between applications, such as databases and compilers versus websites and text editors (all of which are important.) -- AnonymousDonor

I agree that there's no conflict between formal proofs and testing. Say I have a proof that - given certain assumptions - a particular multithreading algorithm will never deadlock. Then I UseAssertions to check if those assumptions are actually true, and I exercise the code using UnitTests to convince myself that those assumptions are probably true. Then I am convinced that the program will never deadlock - a result that I could not have gotten from the proof alone, or from everything *but* the formal proof alone. -- DavidCary

Maybe it's just the kind of programming I do, but I have never been given an algorithm to implement. I usually get a set of examples; if the program receives A then the result should be A1, if the program received B then the result should be B2, etc. This information also usually comes in serially, often after a release or 3 has been fielded. I can write tests that duplicate the examples and then find an "algorithm" that passes the tests known to date. No matter how correct my algorithm today, I will not be surprised to get a telephone call tomorrow from a user telling me that under case Z we should be producing the result Z1.


The amount required to prove a payroll, air traffic control system, or telephone switch is literally beyond human comprehension.

There's probably really no point in proving the whole of such a system, even if it could be specified completely. However, important parts of the system can be specified in an obviously correct way, in the case of the air traffic controller this would for example be Under no circumstances will this system assign the same flight corridor to two planes simultaneously. There's no discussion about this requirement, for violating it would be catastrophic. You also can't ensure it by testing, there are simply too many cases to cover, but you can certainly prove this property. Therefore, a proof can ensure things that testing will not cover. There's simply no need to prove a whole program correct, you get a benefit out of proofs with far less effort.


see: BugFreeSoftware, HaltingProblem

CategoryAnalysis CategoryDebugging CategoryBug CategoryQuality CategoryProof

EditText of this page (last edited November 23, 2014) or FindPage with title or text search