kdb5_util fails to load propagated database under heavy load

Greg Hudson ghudson at mit.edu
Tue Feb 23 17:19:53 EST 2016


On 02/23/2016 01:10 PM, Greg Hudson wrote:
> "Resource temporarily unavailable" is the error message for EAGAIN.  I
> can't see how krb5_db_promote() would generate that error; it doesn't do
> much besides rename a couple of files.

Nevermind, I was looking at the wrong function, and I see what's wrong.
 The short description is that we acquire a non-blocking lock when
creating a DB, when we should be acquiring a blocking lock.  I have
filed a pull request for the one-line fix:
https://github.com/krb5/krb5/pull/411

Here's a longer description of the history behind the bug:

* Debian squeeze had MIT krb5 1.8.x; Debian jessie has MIT krb5 1.12.x.

* Prior to 1.10, we would always lock the database by trying to get a
non-blocking lock up to five times, waiting one second between tries.
This had the potential to fail on very busy databases, and could cause
unnecessary delays in the KDC and kadmind on less busy databases.

* In 1.10 we reorganized the DB2 code while fixing a bug related to
failed kdb5_load operations
(http://krbdev.mit.edu/rt/Ticket/Display.html?id=6814).  As an
unintentional side-effect, we started trying only once to get a lock
when creating the DB; we still did the try-five-times dance when
acquiring locks in other situations.  This change doesn't matter when
initially creating the DB, but does increase the likelihood of failure
during krb5_db_promote().

* In 1.11 we switched to using blocking locks whenever we lock the DB,
except in this one place where we create the DB.


More information about the Kerberos mailing list