thread safety requirements in MIT krb5 libraries
Jeffrey Hutzelman
jhutz at cmu.edu
Thu Dec 18 15:30:25 EST 2003
On Thursday, December 18, 2003 00:41:27 -0500 Ken Raeburn <raeburn at mit.edu>
wrote:
> Callbacks:
>
> I don't think the ability to register callback functions for thread
> system operations will be needed. It appears that for all the platforms
> we care about, there's a standard thread system (or more than one,
> mapping to the same basic OS interface so as to be interoperable). We
> were concerned that people might want to be able to use, for example, the
> Gnu PTH library instead of a native, preemptive pthreads, but I haven't
> heard many people expressing interest.
At some point, AFS is going to gain a Kerberos dependency. When that
happens, being tied to a single threading model will be mildly annoying,
since it won't be ours. But it's still the right approach, IMHO, and I
imagine we'll make do.
> Thread-safe objects:
>
> We've been assuming for a long time that a krb5_context would be used in
> only one thread at a time, for performance reasons and to reduce the
> implementation work on our part. I don't think we've talked about the
> krb5_auth_context objects; I've kind of assumed they'd always be used in
> conjunction with the krb5_context under which they were created.
> There's been discussion about being able to use certain other objects in
> multiple threads, and to take objects created in one thread and context
> and use them in another thread and context serially. Specifically,
> replay caches, credentials caches, and key tables would be good to have
> locked so they can be shared across threads (especially the replay cache,
> for a multithreaded server).
For some of these objects, I'd think it would be reasonable to declare them
non-reentrant, and make locking the application's problem -- as long as
they're not tied to a particular krb5_context, and don't magically get
destroyed along with the context used to create them. I suspect
krb5_auth_context might also fall into this category.
> File locking:
>
> We could maintain a global (per-process) list of files held open with
> locking capability, and check the 'fstat' data for files in this cache
> before locking another file.
Actually, you can do one better. Before opening a file, stat it by name,
and see if it matches one of the things in the cache. If so, you can block
on actually opening it until the matching cache entry is free. It's not
perfect -- if you decide there is _not_ a matching cache entry, then
there's a race which could result in opening something that turns out to be
in the cache after all (if someone substitutes the file out from under
you). But this isn't really any worse than the situation without the
pre-open check.
> Sam suggested we could consider an application restriction that all
> references to a file must use the same name, i.e., "/var/tmp/foobar" and
> "/tmp/foobar" would be handled as separate files, even if "/tmp" is a
> symlink to "/var/tmp". It would simplify things greatly, and in most
> cases wouldn't be a problem, but it still makes me uncomfortable.
Me too. Pathnames often come from users, not application developers.
Users will expect the correct behaviour if they use two names that refer to
the same file. If a filename check is not used, what they'll see instead
is a hard-to-reproduce, non-deterministic case where sometimes a file gets
corrupted. It will generate support load for application developers and
for krbdev.
> Using
> the pathname would, however, be a good first cut -- i.e., we shouldn't
> need to use fstat to know that two replay caches opened with the same
> absolute pathname will be the same, given that the replay cache is
> supposed to refresh from the file if it changes.
Probably true. There's an interesting corner case, though -- if the files
do turn out to be different, which data do you store in the stat cache? If
you keep the old data, then you may be able to deal with someone moving the
replay cache out from under you and replacing it with a new one (is that
expected?), but you'll never notice if some other pathname is the same as
the new file -- unless it happened to be the same as the old file (e.g. a
file in /var/tmp with /tmp a symlink there).
> The filename checks would be a good idea in any case, though. If two or
> ten threads open the same replay cache, it's silly to have multiple
> copies of all of the data, and have each thread reload it every time
> another thread changes it.
Absolutely -- if multiple threads open the same replay cache, they should
get references to the same data structure, with locking inside the library.
The same is probably true of keytabs and ccaches, though with those the
update rate is low enough that it might not be worth the effort.
> Dynamic loading:
>
> I believe it's going to be a requirement that we be able to load the
> Kerberos or GSSAPI library dynamically, do some stuff with it, and unload
> it, and repeat the cycle, without resource leaks, at least for a properly
> written program. So any thread-specific storage or globally-used heap
> storage we keep but hide from the application needs to be freed up when
> the library is unloaded. That shouldn't be hard, but the internal APIs
> we use for per-thread storage might need a little adjusting from the
> POSIX versions to support this better.
Hrm. POSIX allows an implementation-defined limit on the number of
thread-specific storage keys that can be created. I don't know what the
lower bound is on this limit, but it's probably best to strictly limit the
number of these that might be created by Kerberos.
> Cancellation:
>
> It would probably be difficult, if not impossible, to make the library
> code be async cancel safe. However, making it safe for synchronous
> cancellation may be doable. How much do threaded programs actually use
> pthread_cancel or the Windows equivalent?
I'm not sure how hard it might be in practice -- mostly you need to make
sure you release any locks you're holding, and don't leave the data
structures they're protecting in a broken state.
However... Solaris documentation indicates that default cancellation type
is deferred, and that it is appropriate to expect applications and
libraries not to be async cancel safe.
> There's another kind of cancellation that might be desirable too. If
> thread 1 manages a GUI with a cancel button, and thread 2 is waiting for
> packets from the nameserver or KDC, or is running a long calculation to
> generate a key, thread 1 may want to tell thread 2 to stop what it's
> doing. Cancelling the entire thread is one way of doing that, but might
> we want to be able to cancel just the current operation, and propagate
> that fact up to the caller? This is probably a question for people
> working on long-running Mac and Windows GUI programs; UNIX guys like me
> just hit control-C in our terminal windows. :-)
>
> This latter form may just involve having a flag someplace (krb5 context?)
> that will tell the Kerberos library to simply return a special error code
> from whatever it's currently doing, as quickly as possible. But how do
> we get the flag set? Do multiple threads come into it again?
You put a flag in the krb5_context, protect it with a mutex, and provide an
API call to set it. This API call has the unique property that it can
safely be called on a krb%_context that is in use by another thread. And,
of course, you check the flag at appropriate times.
> Feature testing:
>
> How does an application know if the Kerberos library it's using is
> thread-safe? We can do all we want in the Kerberos library, but if we've
> only got gethostbyname available for address lookups, for example, the
> resulting program can't be thread-safe. And I know at least one
> getaddrinfo implementation that's not thread-safe.
You lock around it, if necessary. And possibly, you document which API
routines use possibly unsafe library routines that the app might have to
lock around.
> Should we provide a way to describe which objects can be used
> simultaneously from multiple threads and which cannot, in case we add
> mutex protection to additional objects in the future?
No. Document the API, and leave it at that. Runtime checks add a level of
complexity both for Kerberos and the application, and I don't think they'll
be heavily used.
-- Jeff
More information about the krbdev
mailing list