Developing Provably-Correct Software Using Formal Methods

gizi · on June 1, 2015

some tools such as Verum Dezyne and Quantum Leaps can generate 100% correct source code in C, C++, Java ...

The input into these tools, from which they generate source code, is therefore, the real program and the real source code. How do they prove that this real source code, from which they generate C,C++, or Java, is "correct"?

In fact, this input just constitutes a new programming language -- a DSL -- and the tool is just another compiler or scripting engine. If existing compilers are unprovable, this new compiler is for the same reasons unprovable too. In other words, they did not solve the problem.

tunesmith · on June 1, 2015

Because the code will literally not compile if it isn't correct. As opposed to weaker languages where bugs (run-time problems) can exist.

I'm still something of a layman on this, and I know less about formal specification languages like TLA+, but languages like Coq work by using things like dependent types.

Here's an example: if you have a method that takes (through specifying type parameters) an Int and returns an Array of Integers, and it compiles, then you have effectively proven that it is possible to take an Int and return an Array of Integers.

This is not particularly useful to the method author that is only using that method signature to write some more specific business logic. Dependent types allow you to spruce up that method signature until it actually reflects your business logic. So then you know that if it compiles, the actual intent of your method/function is bug free.

Yes, it is true that there is a "turtles all the way down" element to this kind of thing - "how do they prove that this real source code is correct?" The answer is that languages like Coq rely on a small kernel of code that cannot be proven. It's like their axiom from which everything flows. But the kernel is small enough that people verify it with eyeballs. Bugs can still happen there, and I think they even found one a few years ago, but it didn't have any impact on all the mathematical proofs that have been written using Coq.

I'm sure some experts can jump all over what I just wrote but that's the gist as I understand it so far.

munin · on June 1, 2015

> proven that it is possible to take an Int and return an Array of Integers.

proving that something is POSSIBLE is very easy: just write this function in an untyped language and call it with an int, and show that you get back an array. proven: it's possible.

static type systems give you something much stronger: the guarantee that, always, Int -> [Int]. sometimes languages weaken this to exceptions, sometimes, certain kinds of exceptions.

these systems are pretty straightforward, we've had them for a while, and they're even in environments like c++ and java. the "dependent types" area of the world that you talk about is more exciting: a desire to push more information about the program, as a specification, into the type system. so for example session types are a technology by which you can encode the state machine of a network protocol into the types of data values and functions. then, if your state machine is correct, and the process of translating the state machine into a session types implementation is correct, you can prove that your implementation is a correct implementation of a given protocol.

the dynamic typing half of this, as I understand it, is contracts. you write executable guards around the beginnings of functions that check, at runtime (usually), that the environment the function is in is something the function expects.

both of these technologies come from, I think, the same place: the desire to control the steps that a program makes during runtime. static type systems don't permit programs to be compiled that could take invalid steps, dynamic type systems don't permit programs that run to advance past an invalid step.

schoen · on June 1, 2015

Is the Coq bug you're thinking of this one, from earlier this year?

https://github.com/clarus/falso

tel · on June 1, 2015

That wasn't really in the core verifier exactly. It was more like an optimization.

l_dopa · on June 1, 2015

This is exactly why a lot of formal methods work isn't, imho, terribly compelling.

Something like http://vst.cs.princeton.edu/ has a much, much stronger case for enabling "100% correct software".

djcapelis · on June 1, 2015

Spot on. :)

This has often been the problems with formal methods work. And the spec languages are often worse to find bugs in than the original source!

So not only does this essentially become another DSL, it is usually a pretty bad one.

munin · on June 1, 2015

to be fair it's not just the problem with formal methods work - if you start off with a spec with a bug and implement it faithfully in a non-verified context, you'll still have the bug due to the buggy spec, along with a whole bunch of implementation bugs that you make due to not verifying things...

taeric · on June 1, 2015

I think you are missing the point. In the same way that you can prove there are no out of bounds accesses in java, you can prove that this code represents the correct source code from these languages.

How do they accomplish this? The same way Java does. By flat out disallowing many of the things that the underlying language allows. Consider, it all ends up as machine code in the end.

So, how do they prove this "real" source code is correct? By greatly increasing the burden on producing it. In many contexts, this increase is necessary. In others... not so much.

dllthomas · on June 1, 2015

That doesn't follow at all. It could easily be the case if the proving requires restrictions that other languages don't provide. I'm a little skeptical that they did "solve the problem", bit we can't know much about that without actually looking.

nickpsecurity · on June 1, 2015

It's all definitely worth looking into. Anyone trying to start will find Wheeler's page [1] much more helpful. I also included a book [2] on using dependent types for certified programming. There's also a few examples after of verifications with [3] being pretty cutting-edge in automation & thoroughness. I think we need people to put together a single resource where newcomers can see the various approaches, tools, and practical examples of how to use them. Further, we need to research more on which tools give you the most bug hunting benefit with the least time and expertise. There's some research done already but everything in this field is scattered.

Another point came up in a similar discussion: we need to focus most of our energy on raising the baseline of correctness for the average programmer. This includes things such as automated test generation, design by contract, and better type systems. The developer should be able to specify his or her intention with the code while the type system or runtime catches most of the errors. Maybe we just need a version of BASIC/Pascal with Ada-like safety built-in or similar restrictions on certain constructs. Not sure of the specifics but this is an important problem to solve.

[1] http://www.dwheeler.com/essays/high-assurance-floss.html

[2] http://adam.chlipala.net/cpdt/

[3] https://www.umsec.umn.edu/sites/www.umsec.umn.edu/files/hard...

[4] http://se.inf.ethz.ch/people/morandi/publications/prototypin...

[5] http://compcert.inria.fr/compcert-C.html

[6] http://repository.readscheme.org/ftp/papers/vlisp/guide.pdf

[7] http://research.microsoft.com/pubs/122884/pldi117-yang.pdf

Wayne17 · on June 7, 2015

(I'm the author of the original post.) Model-checkers don’t prove that a program is correct, only that the logic in the program’s abstracted and simplified communications/state model has no inherent race conditions, deadlocks and so forth, i.e. no typical concurrency flaws. Model-checkers don’t “prove” correctness in the mathematical sense (and maybe I should have written “verifiably-correct” instead of “provably-correct”). Rather, model-checkers search for counterexamples to the hypothesis “there are no concurrency flaws in this model”, by exhaustively tracing through all possible distinct execution paths through the model. Amazon calls their models “exhaustively testable pseudo-code”. If no counterexample is found, then we can be sure the model has none of the standard concurrency flaws, and we can be sure that the communications “housekeeping” code auto-generated from the model will have none either. This is a limited yet extremely powerful and valuable aspect of correctness, the hardest kind of correctness for humans to achieve in concurrent software.

I believe software engineers can and should start routinely using model-checking on concurrent logic, taking advantage of the machine’s ability to race relentlessly through enormous numbers of execution paths to uncover subtle flaws in logic. Even a small change to concurrent logic can have far-reaching consequences that are very hard for humans to envision. I believe competitiveness and sound engineering practice demand use of model-checkers on concurrent logic.

tracker1 · on June 1, 2015

Okay, so the software is now provably correct.. now what were the costs in doing so, both in additional time pre-writing definitions and other abstractions for the software itself? Also, how does one prove that said software actually meets the business needs behind said software?

In an academic environment, or something tied to physical devices, this approach isn't a bad one at all... These are settings where software is much more akin to an engineering discipline than it is a craft.

In most software development, which are in any number of business environments building/adapting or bridging custom software that will run in any number of environments... the additional time that this will take over current approaches will be simply unacceptable. Most people don't care if a software is provably correct.. and for that matter, most bugs aren't critical in nature, and don't lead to end of the world scenarios... since most systems don't actually touch personal medical information or financial data.

I think on an intellectual level that approaches like this for some systems make a lot of sense... The problem is rarely do you have well defined interfaces/algorithms for well defined problems... so, you build your system with provably correct software, then the proverbial moose's neck breaks, because you didn't know about the moose.

Wayne17 · on June 7, 2015

In the context of physical systems, companies in the Netherlands including ASML, FEI, Philips Healthcare and Ericsson, have found in the past 10 years that applying model-checking to concurrency logic in control systems reduces both costs and time-to-market while increasing reliability of the built software. Why reductions? Because concurrency bugs are removed at the earliest possible moment, hence they cost orders of magnitude less than if they were found during testing or (worse) at customer sites.