2007-11-13

Gödelizing Turing

Down at O'Reilly, Nitesh Dhanjani makes an extremely well-reasoned point that you cannot outright dismiss static verification tools by an unthinking appeal to Turing completeness, even if in principle (or shall I say, in the limit) you are right. He argues that every static analysis is an approximation, and you should take at face value the analytic power of that approximation even though it never ceases to be one, and in truth it never fails to advertise that it is, in fact, an approximation in the first place.

The halting problem is a direct consequence of Gödel's Incompleteness Theorem. Turing titled his paper "On computable numbers, with an application to the Entscheidungsproblem", that is, he started from Hilbert's Decision Problem much as Gödel did. The upshot of this relationship is that decidable = halting, or incomplete = not provably halting (note, not provably non-halting). Thus, one way to make a meaningful halting analysis (or other kinds of static verification) possible is to use a restricted enough computing semantics to weaken the formal system to the point it is rendered decidable.

This is precisely what static typing systems do. A typing system is a logic (witness the Curry-Howard correspondence) that is powerful enough to be useful, but not so powerful that it becomes undecidable. The typechecker then is a theorem verifier in the logic induced by the typing rules. A typechecker for an undecidable type system would hang, that is, not halt, for some programs and their types, which is not quite desirable. Note that a typing system could be sound and complete, that is, decidable for checking but not for inferring; that is, even though all programs could be verified type-correct or not, some (or most) might not be proven so, without manual intervention from the programmer in the form of typing annotations. This is the case with Haskell's extended typing rules (overlapping instances of types), I believe; that's why Haskell programs tend to be more heavily annotated with types than ML programs, even though both nominally use the Hindley-Milner type system, which is decidably inferrable.

There are many kinds of approximate static verifications that are possible, useful and even in current use: aliasing (and the special case, linearity, that analyzes and removes allocation and garbage generation whenever possible), bounds-preserving in indexed accesses, side-effecting and purity, security, and of course, type soundness. Inasmuch these analysis treat data and control flow statically, in exactly the same way that a theorem verifier checks every step of the derivation tree, the end result is the proof that the verified properties are upheld, or, in the case of the approximations, an "I don't know" result that could be refined either with more information supplied by the programmer to the tool, or with further checks that can be automatically introduced to corroborate that the stated properties are met in runtime.

This, in other words, is the gist of the raging debate between static- and dynamic-typing proponents. The point being denounced by Dhanjani's post, and that I think is more often missed, is that this is not an "either-or" proposition, but only the extreme points in a continuum in which language designers have a conscious choice to make their language lie in.

No comments: