issue with temp policy lock file and kdb5_util error

Will Fiveash will.fiveash at oracle.com
Mon Nov 1 14:04:26 EDT 2010


On Mon, Nov 01, 2010 at 01:00:34PM -0400, Greg Hudson wrote:
> On Fri, 2010-09-24 at 15:09 -0400, Will Fiveash wrote:
> > Recently someone was having issues with kprop working and the problem
> > appears to be related kpropd aborting when processing a resync with the
> > master KDC and leaving temp principal files behind like:
> > 
> > loki# ls princ*
> > principal.ulog         principal~.kadm5       principal~.ok
> > principal~             principal~.kadm5.lock
> 
> I have reproduced this issue by adding "if (getenv("CRASH") != NULL)
> abort();" to dump.c after the database is opened.  The failure is that
> subsequent kdb5_util load invocations fail in krb5_db2_create() because
> check_openable() returns true.  The error message seen is "kdb5_util:
> File exists".
> 
> At the moment, an admin can recover from this situation by running "rm
> principal~*".  There is no higher-level command to perform that.
> 
> There are a few approaches we could take to improving kdb5_util load's
> behavior after an aborted load.  The best would be to make "kdb5_util
> load" remove the temp DB if one exists as a remnant of an aborted load.

I agree the best solution is one that doesn't require an admin to take
additional action to clean up an aborted load.

> However, we'd want to preserve the property that if two "kdb5_util load"
> processes overlap, the second one fails and doesn't interfere with the
> first one.
>
> Right now the sequence of operations is:
> 
>   1. Create the temp DB (fail if it exists).
>   2. Lock the temp DB.
>   3. Load the dump file into the temp DB.
>   4. Unlock the temp DB.
>   5. Promote the temp DB.
> 
> Steps 2 and 4 of this sequence are not especially useful.  There is a
> race condition where two kdb5_util load operations can both succeed in
> creating the temp DB, but the locking does not generally resolve that
> race condition into correct behavior.
> 
> To make this sequence work properly, we need an atomic DAL operation to
> create a locked temp DB, overwriting the existing one if an unlocked one
> exists.  (This can be an alteration of the contract for the existing API
> for creating a temp DB.)  Likewise, krb5_db_promote() needs to require
> that the temp DB is already locked, and release the lock after the temp
> DB is promoted.
>
> I estimate this at 2-5 days of work.  For now I will open a ticket (or
> annotate an existing one if I find it), and I'll see about allocating
> the necessary time.

Sounds good to me, thanks for looking into this.
-- 
Will Fiveash
Oracle
http://opensolaris.org/os/project/kerberos/
Sent using mutt, a sweet, text based e-mail app <http://www.mutt.org/>



More information about the krbdev mailing list