"Secure coding" audit checkers and Kerberos

Wed Oct 15 15:57:32 EDT 2008

On Wed, Oct 15, 2008 at 02:54:09PM -0400, Greg Hudson wrote:
> On Tue, 2008-10-14 at 23:08 -0500, Nicolas Williams wrote:
> > +1.  I don't see the benefit of memcpy() vs. strcpy() unless you're
> > effectively building a set of strl*()-like utility functions.
> 
> I'm hearing that the external-to-MIT Kerberos developer community likes
> strlcpy.  I will probably accede to this preference since it's not
> really important, but I want to go around on it once more because
> there's also significant anti-strlcpy sentiment in parts of the C
> programming community.
> 
> Ulrich Drepper wrote, in one of his more tactful moments:
> 
>         Correct string handling means that you always know how long your
>         strings are and therefore you can you memcpy (instead of
>         strcpy).

I don't agree.

With strl*() you need only know the destination buffer's size, and then
check the strl*() result to see if truncation/overflow occurred.  (As
the manpage says:

"
     Buffer overflow can be checked as  follows:

       if (strlcat(dst, src, dstsize) >= dstsize)
              return -1;
"

and

"
                                                       ... Buffer
     overflow can be checked as  follows:

       if (strlcpy(dst, src, dstsize) >= dstsize)
              return -1;

"

> To expand on his point: you cannot judge the security of string
> manipulation code by the individual statements involved.  Either the
> code is correctly considering the lengths of the strings involved or
> it's not.  If you replace a bad strcpy() or memcpy() call with an
> unchecked strlcpy() call, you will simply replace a potential buffer
> overrun with a potential truncation, which is still a bug and possibly a
> security bug.

It is clearly not sufficient to replace instances of strcpy() with
strlcpy().  Neither is it to replace them with memcpy().  You have to
add some code as well.

I.e., this:

	(void) strcpy(dst, src);

should be replaced with either:

	size_t dstsize = <find size of dst>;
	size_t srcsize = <find length of src>;

	if (srcsize >= dstsize)
		<return an error, realloc dst, whatever>;
	else
		<use strcpy, memcpy, strlcpy, whatever>;

or:

	size_t dstsize = <find size of dst>;

	if (strlcpy(dst, src, dstsize) >= dstsize)
		<return an error, realloc dst, whatever>;

The second idiom is simpler than the first.  Both are morally
equivalent.  Simplicity should therefore win.

> Using memcpy() is not a guarantee of safety, but it does indicate that
> [...]

Of course not.  The same is true of strncpy().  And strlcpy().  The
surrounding code matters.

You can simplify the above idioms even more if you can establish some
common conventions, like: always malloc() buffers, in which case you
could use strdup() and build a astrcat().  You'd then have to be careful
not to leak memory, of course.  (There's *never* a free lunch in C.)

> Part of the disconnect within the C community is that the pro-strlcpy
> crowd typically envisions checked usages while the anti-strlcpy crowd
> typically envisions unchecked usages.  One glibc maintainer went so far
> as to call strlcat() inefficient because it has to measure the length of
> the source string for a "rarely-used return value" in the overflow case.

You can't have unchecked calls to anything here.  Either have pre-call
checks that guarantee success, or post call error checks, or both.

Choose a philosophy and stick to it.  But don't claim one allows you to
avoid checks.  Both require checks.

Or re-write in a higher-level labguage that avoids the whole issue  ;)

> If we were to use strlcpy and strlcat in Kerberos, I would expect to
> have unchecked usages when it is easy to tell that an overflow can't
> occur.  Most of our string manipulation code uses allocated buffers, so
> I would expect to see code like:

Sure.

>   buflen = strlen(s1) + strlen(s2) + strlen(s3) + 1;
>   buf = malloc(buflen);
>   if (!buf) return ENOMEM;
>   (void) strlcpy(buf, s1, buflen);
>   (void) strlcat(buf, s2, buflen);
>   (void) strlcat(buf, s3, buflen);
> 
> Adding an unreachable, untestable error path after each strl*() call in
> this situation would expand the code size, could force the use of
> goto-style exception handling in functions which don't otherwise need
> it, and would inevitably introduce phantom resource leaks which take
> time to fix.

That's what assert() is for:

    buflen = strlen(s1) + strlen(s2) + strlen(s3) + 1;
    buf = malloc(buflen);
    if (!buf) return ENOMEM;
    newlen = strlcpy(buf, s1, buflen);
    assert(newlen < buflen);
    newlen = strlcat(buf, s2, buflen);
    assert(newlen < buflen);
    newlen = strlcat(buf, s3, buflen);
    assert(newlen < buflen);

If this were a very common idiom you could then build a utility function
or macro.  If you #define NCHECK then the compiler might optimize away
the assignments to newlen.

> On the other hand, some code which uses fixed-length buffers might
> actually get prettier if we use strl*() length checks instead of an
> explicit length check prior to the string operations.

Six of one...

> (In case anyone is concerned, the vast majority, possibly all, of the
> Kerberos uses of strcpy() and strcat() are safe for one reason or
> another.  But lint and similar tools have no idea whether this is true,
> so they just flag all uses of it.)

Such tools should probably also flag memcpy(), strlcpy(), ... as well.
Without analyzing the code you can't conclude that the code is correct
just because it doesn't use a function that's on someone's blaklist.

strcpy() can be used safely if you check before calling it.  memcpy()
cna be used unsafely if the preceding checks are incorrect.

If you *really* want to solve the static analysis problem then you need
a better model of strings and buffers than the C string model.  You
might say that you need a better language, but that wouldn't be
practical right now.

One of the nice things about strl*() is that it's easy to check whether
their return values are being checked.  I doubt any lint tools exist
which statically analyze that: a) the correct dstsize is passed in to
strl*(), and b) that the result of strl*() is checked against dstsize,
much less c) that the action on overflow is correct.  The same, no
doubt, is true w.r.t. memcpy() and pre-call checking.  But it should be
easier to do the static analysis for strl*() than for memcpy() since
more useful information is conveyed to the analyzer by the very name and
semantics of strl*() than by memcpy()'s.

...half a dozen of the other.

One thin-gruel argument against using memcpy() is that it is an
abstraction violation.  These are *strings* we're talking about, in the
codeset of the current locale (see my argument about lint, above).  This
argument works the other way too: if you're dealing with strings in one
specific codeset then you don't want the caller's current locale to have
any impact on what actually happens.  Of course, I doubt any strl*()
implementations do anything odd like, say, composing Unicode codepoints
where two strings are concatenated and the first one ends in a combining
codepoint, or that they ever will.

Nico
--