NULL ptr dereferences found with Calysto static checker

Mon Jun 11 10:30:50 EDT 2007

Ken,

On 6/9/07, Ken Raeburn <raeburn at mit.edu> wrote:
> Sure, makes sense.  I hope it's easy to expand Calysto's database of
> such functions.

It is. Currently, it requires recompilation of Calysto, but in the
future it will be done through simple modifiable specifications.
However, there are quite a few things on my research agenda that are
much more urgent.

> Can you express such things as "F returns non-zero
> or sets *x to a valid pointer"?

Yes, of course.

> I played a bit with Splint some time
> back, and one of its most glaring problems was the inability to
> understand stuff like that -- in particular, realloc() success and
> failure paths.

Note that Calysto is a completely different beast. It's fully
path-sensitive. For instance, Calysto easily proves (in 0.02 sec) that
the following assertion is always satisfied:

int f(int a, int b) {
    int a1;
    if (a%2) { a1 = a + 1; }
    else { a1 = a; }
    int c = a1 * a1;
    assert(c % 4 == 0);
}

Of course, to achieve path-sensitivity you need some form of a theorem
prover. Calysto owns a lot of its scalability to the Spear theorem
prover ( http://www.domagoj.info/index_spear.htm ), which has been
developed (and heavily optimized) for software verification.

Currently, assertion-checking is not yet ready for mass-consumption, but
NULL-ptr dereference checking is very close. As Calysto evolves,
projects that join Calysto Community will also get assertion checking.

> > 2) You are willing to help me out analyzing the results.
> Well, I'm willing to try. :-)

Excellent! :-)

> Is it possible to save the analyzed behavior of some functions or
> their uses, maybe programmatically transform some subset of it into
> some input format acceptable to Calysto (e.g., do you have an input
> file that describes strncmp and gethostbyname to it?), and import the
> result into an analysis of another module?

It is doable, but IMHO, static checker should do checking not linking
:-). So, I prefer to get a complete input, and specify only a few most
important external functions through additional constraints.

> > It is just a simple heuristic to decrease the number of false
> > positives
> > - if the context is available, Calysto always takes it into
> > account. As
> > said before, really not a big deal to change this behaviour.
>
> Okay, that sounds good.  Especially if a report on the assumptions
> made can be produced, we could presumably correct certain wrong
> conclusions and re-run the analysis?

Of course.

> > How many should there be?
>
> Looking at a UNIX install tree, I've got 11 shared libraries, 39
> executables (of which a few, like krb5kdc and kadmind, are the most
> critical, but almost all involve network traffic and thus subject in
> one way or another to malicious input, and two are normally installed
> setuid root), and depending on configuration options, one or two .so
> files that may get dynamically loaded by krb5kdc at run time with
> dlopen.  We only build shared libraries for most things currently;
> it's not possible to do a fully static build of any of the
> executables, without some pounding on our somewhat baroque build system.

After spending a couple of hours analyzing your build process, I found
the culprit: export-check.pl fails. After commenting lines 74 & 76,
everything went through, producing 14 shared libraries and 65 binaries.

I've run Calysto, and after 13.4 hours of computation got 784 warnings,
which is quite a lot considering the size of the project. Roughly one
third is in libkrb5.so.

I've attached the log. Please, check it out and sort binaries that have
unique_locations!=0 by priority, at least give me the top-ten most
important ones. Note that since binaries share some code, some
warnings are reported multiple times. After a quick analysis of the
reports, it seems that there are 784 unique ones. Binaries are named
source_path_binary_name.bc.

As soon as I get the 'top-list', I'll start analyzing and postprocessing
the reports, and send them to krbcore-security at mit when done.

> > Another possible approach (that I've done with sendmail) is to compile
> > all sources individually, and link them in a knapsack manner -
> > maximising the number of object files linked together without linkage
> > conflicts.
>
> Ah, I see.  So you generate bitcode files from LLVM for individual
> source/object files manually somehow, and feed a collection of them
> to Calysto?
>
> If you do it by hand, I think you wind up needing to process all of
> the library directories before you can build the KDC -- util/support,
> util/et, util/profile, lib/crypto, lib/krb5, lib/gssapi, lib/rpc, lib/
> des425, lib/krb4, lib/kadm5/, lib/kadm5/srv, lib/kdb, lib/apputils,
> and finally kdc/.  (Normally the lib/krb5 directory builds a library
> that includes the util/profile objects, and lib/kadm5/srv builds a
> library using the objects in lib/kadm5.)  That sounds like a lot of
> manual work.
>
> I don't know what your process is for generating the intermediate BC
> files; is it something that could be tied into a normal build somehow
> to generate and process the additional files?

Essentially, one would need to compile an application to bitcode files.
LLVM is pretty close to it, but does not have binutils. In order to
circumvent this problem, I wrote a linkage script
http://lists.cs.uiuc.edu/pipermail/llvmdev/2006-December/007577.html
that solves the problem. Using it, I've managed to build a solid number
of apps. Some apps (like krb5) require more hacking.

Note that that script is quite a bit out of date, and I've changed
it in the meantime. After I move to the new LLVM 2.0, I'll resend the
new scripts so that people can compile their own code into bc files,
saving my time. In the meantime, I'll have to explore the beauty of
inventive build systems on my own :-)

Regards,

-- 
        Domagoj Babic

        http://www.domagoj.info/