thread safety requirements in MIT krb5 libraries

Fri Dec 19 18:07:48 EST 2003

Jeffrey Hutzelman <jhutz at cmu.edu> writes:
>> File locking:
>>
>> We could maintain a global (per-process) list of files held open with
>> locking capability, and check the 'fstat' data for files in this cache
>> before locking another file.
>
> Actually, you can do one better.  Before opening a file, stat it by
> name, and see if it matches one of the things in the cache.  If so,
> you can block on actually opening it until the matching cache entry is
> free.  It's not perfect -- if you decide there is _not_ a matching
> cache entry, then there's a race which could result in opening
> something that turns out to be in the cache after all (if someone
> substitutes the file out from under you).  But this isn't really any
> worse than the situation without the pre-open check.

It's no worse than skipping the pre-open check, but if we want to do
it *instead* of the post-open check, the race condition could be a
problem.  Question is, how much do we care?  Is it ever going to be a
security issue, or just a mild inconvenience in a few oddball cases,
few enough that we can ignore them?

>> Using
>> the pathname would, however, be a good first cut -- i.e., we shouldn't
>> need to use fstat to know that two replay caches opened with the same
>> absolute pathname will be the same, given that the replay cache is
>> supposed to refresh from the file if it changes.
>
> Probably true.  There's an interesting corner case, though -- if the
> files do turn out to be different, which data do you store in the stat
> cache?  If you keep the old data, then you may be able to deal with
> someone moving the replay cache out from under you and replacing it
> with a new one (is that expected?), but you'll never notice if some
> other pathname is the same as the new file -- unless it happened to be
> the same as the old file (e.g. a file in /var/tmp with /tmp a symlink
> there).

That would be something for the "stat-close-open-reload" code to deal
with as it goes about its checks for an updated file.  If you never
again touch the old object with the cache of data from the old version
of the file, yes, you'll be wasting storage, but otherwise it
shouldn't matter.

>> The filename checks would be a good idea in any case, though.  If two or
>> ten threads open the same replay cache, it's silly to have multiple
>> copies of all of the data, and have each thread reload it every time
>> another thread changes it.
>
> Absolutely -- if multiple threads open the same replay cache, they
> should get references to the same data structure, with locking inside
> the library. The same is probably true of keytabs and ccaches, though
> with those the update rate is low enough that it might not be worth
> the effort.

Yes, the replay cache is critical from the performance perspective.
While combining the references isn't actually a thread safety matter
(it'd be perfectly thread-safe if we had multiple copies), it may be
more efficient to tackle it while dealing with the file locking
issues.

>> Dynamic loading:
>>
>> I believe it's going to be a requirement that we be able to load the
>> Kerberos or GSSAPI library dynamically, do some stuff with it, and unload
>> it, and repeat the cycle, without resource leaks, at least for a properly
>> written program.  So any thread-specific storage or globally-used heap
>> storage we keep but hide from the application needs to be freed up when
>> the library is unloaded.  That shouldn't be hard, but the internal APIs
>> we use for per-thread storage might need a little adjusting from the
>> POSIX versions to support this better.
>
> Hrm.  POSIX allows an implementation-defined limit on the number of
> thread-specific storage keys that can be created.  I don't know what
> the lower bound is on this limit, but it's probably best to strictly
> limit the number of these that might be created by Kerberos.

Yes, something like one per library would probably be okay.  Or even
just one for all of the MIT Kerberos libraries together, managing
per-library data with a layer of indirection.  (That's assuming we
actually come up with anything besides the error_message buffer.)

> You put a flag in the krb5_context, protect it with a mutex, and
> provide an API call to set it.  This API call has the unique property
> that it can safely be called on a krb%_context that is in use by
> another thread.  And, of course, you check the flag at appropriate
> times.

That's one possibility that lxs and I were kicking around briefly,
yeah.

>
>> Feature testing:
>>
>> How does an application know if the Kerberos library it's using is
>> thread-safe?  We can do all we want in the Kerberos library, but if we've
>> only got gethostbyname available for address lookups, for example, the
>> resulting program can't be thread-safe.  And I know at least one
>> getaddrinfo implementation that's not thread-safe.
>
> You lock around it, if necessary.  And possibly, you document which
> API routines use possibly unsafe library routines that the app might
> have to lock around.

Sounds nice and easy.  I'm not sure it is, though.  It's going to be
very system-specific.  For example, on which systems do we potentially
call gethostbyname, or a non-thread-safe system implementation of
getaddrinfo, from all of the routines that can acquire tickets?  Well,
NetBSD's getaddrinfo is (or used to be) unsafe.  On Linux and AIX, we
use gethostbyname or gethostbyname_r to get the correct canonical name
to fix up the getaddrinfo results, but the configure script decides
which one of the two we'll call.

Perhaps, during this work, we should consider dropping support for any
platform that provides neither a working getaddrinfo, nor a
gethostbyname_r, nor a thread-safe gethostbyname?

>> Should we provide a way to describe which objects can be used
>> simultaneously from multiple threads and which cannot, in case we add
>> mutex protection to additional objects in the future?
>
> No.  Document the API, and leave it at that.  Runtime checks add a
> level of complexity both for Kerberos and the application, and I don't
> think they'll be heavily used.

Actually, I was thinking of compile-time checks:

  #ifndef KRB5_THREADSAFE_CCACHE
      if (pthread_mutex_lock(&global_ccache_lock) != 0) { ... }
      err = krb5_do_something(my_ctx, global_ccache);
      pthread_mutex_unlock(&global_ccache_lock);
  #else
      err = krb5_do_something(my_ctx, global_ccache);
  #endif

We can document what a given version does, but if the next version
changes things, then what?

Ken