NULL ptr dereferences found with Calysto static checker

Ken Raeburn raeburn at MIT.EDU
Sat Jun 9 05:30:46 EDT 2007


On Jun 7, 2007, at 22:06, Domagoj Babic wrote:
> Thx for clarification - Calysto can't analyze the code that's not
> compiled into the same module. So, to be on the safe side it has to
> interpret all pointers passed to external functions as undefined. And
> since getaddrinfo is external - it has no clue what it returns or how
> result is initialized. Of course, this can be added as an additional
> constraint, but I have done that only for the functions that most
> frequently cause false positives.

Sure, makes sense.  I hope it's easy to expand Calysto's database of  
such functions.  Can you express such things as "F returns non-zero  
or sets *x to a valid pointer"?  I played a bit with Splint some time  
back, and one of its most glaring problems was the inability to  
understand stuff like that -- in particular, realloc() success and  
failure paths.

>> There are functions in the public API where it'd be nice to check for
>> null and return some nice error code, where we don't do that already,
>> I expect.  I'm curious how Calysto works, that it reported these in
>> the support library, but not some of the libkrb5 functions, like
>> krb5_mk_req for example, which doesn't check that the context
>> argument is non-null before using it.  (Or did you remove some of
>> these already?)
>
> Here's what Calysto does: If a function is called (no matter  
> external or
> internal), it's analyzed in all the contexts in which it is called. If
> the function is external and not called anywhere, it's analyzed as one
> of the roots of the callgraph.
>
> This behaviour is trivial to change. I could easily modify Calysto to
> consider any function with external linkage to be a potential root of
> the call graph. However, this would produce a ton of warnings. Such a
> change would have sense if:
>
> 1) You make sure that all the functions that should have internal
> linkage really have internal linkage (this will reduce the number of
> warnings significantly).

We haven't finished doing that yet.  On Mac OS X (KfM, with an Xcode  
build process) and Windows, the export lists for shared libraries  
have been kept minimal, but aren't enough to build a KDC.  On UNIX  
(including Darwin and configure+make builds on Mac without the KfM  
bits), several of our export lists include every or nearly every non- 
static symbol, because we had implemented the export list support but  
hadn't come up with the precise symbol list yet.  So neither version  
of the library has quite the right set of symbols for this analysis.

Perhaps, as a first cut, it would be best to analyze specific  
programs, and not the full library interface.

> 2) You are willing to help me out analyzing the results.

Well, I'm willing to try. :-)

> Another option is that you provide a list of functions which need  
> to be
> checked as having 'external linkage' and must be fault-tolerant.

Probably also a good thing to try at some point.

>> Ah... but krb5_mk_req isn't called by anything else in libkrb5.  And
>> if you're looking across library boundaries, functions like
>> krb5int_getaddrinfo are called from code in libkrb5.
>
> Sorry, can't look across the module boundaries at the current point.
> There's one way to hack around this - use llvm-link to link the  
> produced
> binaries, but for krb, I get only one binary produced anyways  
> (libkrb).

Is it possible to save the analyzed behavior of some functions or  
their uses, maybe programmatically transform some subset of it into  
some input format acceptable to Calysto (e.g., do you have an input  
file that describes strncmp and gethostbyname to it?), and import the  
result into an analysis of another module?

>> I'm not sure that last constraint is entirely logical.  If F is
>> called from another function in the same library with a non-null
>> argument, but is exported from the library and is intended to support
>> a null argument as well, it'd be useful to know if it's dereferencing
>> without checking.  Of course, whether the function is supposed to get
>> a null pointer is less a matter of the code than of the
>> specification, so heuristics are the best you're going to be able  
>> to do.
>
> It is just a simple heuristic to decrease the number of false  
> positives
> - if the context is available, Calysto always takes it into  
> account. As
> said before, really not a big deal to change this behaviour.

Okay, that sounds good.  Especially if a report on the assumptions  
made can be produced, we could presumably correct certain wrong  
conclusions and re-run the analysis?

>> Which binary?  We build a number of them.  If the results are only
>> from one of them, that might explain the missing functions.
>
> llvm-gcc (with my linker) produces only one binary -
> libkrb5.support.so.0.1.bc

Oohh... okay.  There's the problem.  The krb5support library is our  
lowest-level support library, which has things like wrappers around  
system threading and hostname resolution functions -- none of the  
real Kerberos functionality.

> How many should there be?

Looking at a UNIX install tree, I've got 11 shared libraries, 39  
executables (of which a few, like krb5kdc and kadmind, are the most  
critical, but almost all involve network traffic and thus subject in  
one way or another to malicious input, and two are normally installed  
setuid root), and depending on configuration options, one or two .so  
files that may get dynamically loaded by krb5kdc at run time with  
dlopen.  We only build shared libraries for most things currently;  
it's not possible to do a fully static build of any of the  
executables, without some pounding on our somewhat baroque build system.

> The most likely culprit is the front-end - currently LLVM doesn't  
> offer
> a good support for compiling apps to bc files, which Calysto takes as
> input, so I had to hack my own linker, which again does only what
> llvm-gcc asks it to do. If you could list the core binaries and their
> respective source directories, I could try to hack the compilation to
> produce the required binaries.
>
> Another possible approach (that I've done with sendmail) is to compile
> all sources individually, and link them in a knapsack manner -
> maximising the number of object files linked together without linkage
> conflicts.

Ah, I see.  So you generate bitcode files from LLVM for individual  
source/object files manually somehow, and feed a collection of them  
to Calysto?

If you do it by hand, I think you wind up needing to process all of  
the library directories before you can build the KDC -- util/support,  
util/et, util/profile, lib/crypto, lib/krb5, lib/gssapi, lib/rpc, lib/ 
des425, lib/krb4, lib/kadm5/, lib/kadm5/srv, lib/kdb, lib/apputils,  
and finally kdc/.  (Normally the lib/krb5 directory builds a library  
that includes the util/profile objects, and lib/kadm5/srv builds a  
library using the objects in lib/kadm5.)  That sounds like a lot of  
manual work.

I don't know what your process is for generating the intermediate BC  
files; is it something that could be tied into a normal build somehow  
to generate and process the additional files?

Ken



More information about the Kerberos mailing list