issue with temp policy lock file and kdb5_util error
ghudson at MIT.EDU
Mon Nov 1 13:00:34 EDT 2010
On Fri, 2010-09-24 at 15:09 -0400, Will Fiveash wrote:
> Recently someone was having issues with kprop working and the problem
> appears to be related kpropd aborting when processing a resync with the
> master KDC and leaving temp principal files behind like:
> loki# ls princ*
> principal.ulog principal~.kadm5 principal~.ok
> principal~ principal~.kadm5.lock
I have reproduced this issue by adding "if (getenv("CRASH") != NULL)
abort();" to dump.c after the database is opened. The failure is that
subsequent kdb5_util load invocations fail in krb5_db2_create() because
check_openable() returns true. The error message seen is "kdb5_util:
At the moment, an admin can recover from this situation by running "rm
principal~*". There is no higher-level command to perform that.
There are a few approaches we could take to improving kdb5_util load's
behavior after an aborted load. The best would be to make "kdb5_util
load" remove the temp DB if one exists as a remnant of an aborted load.
However, we'd want to preserve the property that if two "kdb5_util load"
processes overlap, the second one fails and doesn't interfere with the
Right now the sequence of operations is:
1. Create the temp DB (fail if it exists).
2. Lock the temp DB.
3. Load the dump file into the temp DB.
4. Unlock the temp DB.
5. Promote the temp DB.
Steps 2 and 4 of this sequence are not especially useful. There is a
race condition where two kdb5_util load operations can both succeed in
creating the temp DB, but the locking does not generally resolve that
race condition into correct behavior.
To make this sequence work properly, we need an atomic DAL operation to
create a locked temp DB, overwriting the existing one if an unlocked one
exists. (This can be an alteration of the contract for the existing API
for creating a temp DB.) Likewise, krb5_db_promote() needs to require
that the temp DB is already locked, and release the lock after the temp
DB is promoted.
I estimate this at 2-5 days of work. For now I will open a ticket (or
annotate an existing one if I find it), and I'll see about allocating
the necessary time.
More information about the krbdev