Weird KDC behaviour with getprincs/kdb5_util (V5 1.2.2, Solaris 8)

Ken Raeburn raeburn at MIT.EDU
Tue Mar 26 20:56:46 EST 2002


peirce at gumby.it.wmich.edu (Leonard J. Peirce) writes:
> We're seeing something very strange on our KDC.  We have approximately
> 46,000 total principals.  When we propagate (kdb5_util dump) or do
> getprincs in kadmin to get a list of all principal names the resulting
> output (in both cases) is missing over half of the principals that we
> know are in the database.  Our slave server is pretty much useless at
> this point until we get this working again.

Ouch.

This sounds rather like a form of database corruption I've seen once
before while I was at Cygnus, though it was with a different database
back end (BSD NDBM, if I remember correctly).  One of the failure
modes (there were many we encountered in the investigation) was that
in certain overflow cases, iterating over the database entries would
miss some, or would even loop indefinitely over a small set of
entries.  The overflow was caused by a large number of collisions in
the NDBM hash function (or the set of bits from the hash value
actually used for selecting a position in the file).

It's not trivial to recover; you'll probably have to write code for
operating on the database directly rather than through the so-called
higher-level interfaces we use for most operations that understand the
Kerberos database format.  We don't have any such code lying around at
the moment.

First of all, I'd recommend disabling kadmind, turning off execute
permission on kadmin.local, whatever else it takes to stop changes
from being made.  If the database is in fact corrupted, you don't want
to risk breaking it further.

If you can extract the database contents, you may be able to create a
database with the same data in a different format.  Our use of db2
(Sleepycat's DB 1.85 with patches, actually, and we really need to
look into updating once we figure out a more specific policy on what
licenses we can accept on code we import) supports at least two back
end formats.  If yours is hash, I'd strongly recommend switching to
btree.  If you're using btree, you could debug the problem, or switch
to hash, but the hash back end has tended to be more buggy.  Or, if
you're really psyched to dive into it, you could try updating your
tree to use a more recent Sleepycat release or some other back end.

Have you dumped and reloaded your master database any time recently?
We switched to btree format a while back, but if you never dumped and
reloaded, you may still be using hash format, which would not be good.
You can tell the database type by the magic number in the first four
bytes -- 0x053162 is btree, 0x061561 is hash.

The more interesting part right now is how to get the data out, so
that you can stuff it back into a database in a different format.
Even though you can't walk through the database sequentially, there
are still a couple ways you may be able to extract the data.

First, if you have a complete list of current principal names, write a
little program to walk over that list, generate the correct database
key for each name, and extract the data from the database through the
db2 interface.  Then write it into another, freshly-created database.

Or, second, if the above approach doesn't work, open the database file
as a plain file, and simply scan through it, taking note of anything
that looks like it might be a database record.  ASCII string for the
principal name in the key with a limited range of characters.  For the
database record, key data has reasonable key types and correct
lengths, reasonable-looking flags set on the principals, etc.  Once
you get names, maybe you can use the db2 interface to read out the
data.  If not, you'll have to decipher the database format enough to
locate the data yourself and pull it out.



Oh yes...  If it's btree that's broken, and you don't want to debug
the btree code yourself, please send us, or me, a copy of the data you
get out -- with actual key data overwritten, of course -- in a bug
report so we can try to fix it, assuming we don't switch database
formats.  Changing principal names or sizes of records may make it
impossible to reproduce the problem.

> The really odd part is that the principals that don't show up are in the
> database and continue to work fine.  Users can get tickets, use them for
> rlogin/telnet/ftp, and change their passwords.  We can do getprinc for any
> one of the missing entries and they show up just fine.  But running getprincs
> to list the entire database or kdb5_util dump both fail to list them.

Yes, this is consistent.  Random and sequential access often use very
different code paths.

> BTW, I tried using
> 
>    kdb5_util dump dump.out <principal>
> 
> to dump a single principal and didn't get the principal dumped.  Instead,
> it appeared to dump just the policies that we have defined.  Am I misreading
> the man page?  I had hoped to be able to dump each individual principal,
> append to a file, and possibly reload the database.

Sorry....  The implementation of that form is basically, "while you're
walking through the database, ignore entries not matching one of these
principal names".  So if the normal dump doesn't see the principal,
this form won't either.

> Any suggestions on troubleshooting this?  Could it be a buffer being over-
> run someplace?

There is a chance that it's just a bug with sequentially retrieving
data from the database.  The only way that helps you, though, is that
*if* you go and find the bug and fix it, then you don't still have the
problem of extracting what data you can from a broken database.  The
problem still has to be fixed for your slave KDCs to become useful
again.

Ken



More information about the Kerberos mailing list