kadmin incremental propagation full resync multiple processes spawned

Paul B. Henson henson at acm.org
Thu Nov 3 17:27:57 EDT 2011


On 11/3/2011 11:14 AM, Greg Hudson wrote:

> Sorry about that; it's been a busy time.

Understood, I appreciate the reply :).

> It's definitely relevant.  This implies the kdb5_util dump command
> failed.  At that point there are several bugs:
[...]
> All of these bugs existed in 1.8 and 1.7 (the code hasn't changed); the
> new factor is that the dump is failing.  Unfortunately, we kind of have
> to guess at why.  The most obvious candidates are that (1) the path to
> kdb5_util is wrong, or (2) the path to the dump file is in a nonexistent
> directory.

It seems either of those issues would result in a full resync *never* 
working? Which is not the case in this scenario, as after some number of 
failures (one this time, about half a dozen last time) it does 
eventually succeed:

Nov  2 03:52:56 halfy kadmind[20238]: Request: iprop_full_resync_1, 
spawned resync process 20610, 
client=kiprop/loogie.unx.csupomona.edu at CSUPOMONA.EDU, 
service=kiprop/kerberos-master.csupomona.edu at CSUPOMONA.EDU, 
addr=134.71.247.11

Nov  2 03:55:10 halfy kadmind[20238]: Request: iprop_get_updates_1, 
UPDATE_OK; Incoming SerialNo=103785; Outgoing SerialNo=103790, success, 
client=kiprop/loogie.unx.csupomona.edu at CSUPOMONA.EDU, 
service=kiprop/kerberos-master.csupomona.edu at CSUPOMONA.EDU, 
addr=134.71.247.11

and return to incremental propagation.

There is a dump file created:

-rw------- 1 root root        1 Nov  2 03:53 
/var/lib/krb5kdc/slave_datatrans_loogie.unx.csupomona.edu.loogie.unx.csupomona.edu.last_prop
-rw------- 1 root root 45864490 Nov  2 03:53 
/var/lib/krb5kdc/slave_datatrans_loogie.unx.csupomona.edu
-rw------- 1 root root        1 Nov  2 03:53 
/var/lib/krb5kdc/slave_datatrans_loogie.unx.csupomona.edu.dump_ok

Looking back through the logs, there are instances where a full resync 
is requested, and immediately succeeds with no failures:

Oct 16 04:09:08 halfy kadmind[19398]: Request: iprop_get_updates_1, 
UPDATE_FULL_RESYNC_NEEDED; Incoming SerialNo=90237; Outgoing 
SerialNo=N/A, success, 
client=kiprop/loogie.unx.csupomona.edu at CSUPOMONA.EDU, 
service=kiprop/kerberos-master.csupomona.edu at CSUPOMONA.EDU, 
addr=134.71.247.11

Oct 16 04:09:08 halfy kadmind[19398]: Request: iprop_full_resync_1, 
spawned resync process 30132, 
client=kiprop/loogie.unx.csupomona.edu at CSUPOMONA.EDU, 
service=kiprop/kerberos-master.csupomona.edu at CSUPOMONA.EDU, 
addr=134.71.247.11

Oct 16 04:11:21 halfy kadmind[19398]: Request: iprop_get_updates_1, 
UPDATE_NIL; Incoming SerialNo=94785; Outgoing SerialNo=N/A, success, 
client=kiprop/loogie.unx.csupomona.edu at CSUPOMONA.EDU, 
service=kiprop/kerberos-master.csupomona.edu at CSUPOMONA.EDU, 
addr=134.71.247.11

So kdb5_util is failing sometimes <sigh>. Are any of these bugs 
perchance on the shortlist for getting resolved :)? Presumably if the 
kdb5_util failure details were logged it might be blindingly obvious 
what's going on, or at least a step closer to resolution.

If none of the others, the child continuing on to serve requests seems 
like potentially a severe bug. From my understanding, multiple processes 
simultaneously manipulating a bdb database without the proper 
precautions can result in corruption :(. Given kadmin probably assumes 
it will be the only process ever writing it might not bother with the 
overhead of multiprocess synchronization.

Hmm, another thought -- we originally switched to incremental 
propagation because during periods of high load the full replication 
would result in the database being locked and kadmind failing to execute 
updates occasionally. Given that now the full replication is triggered 
because a high update rate prevented incremental from working for too 
long, could it be that kdb5_util is failing to lock the database because 
kadmind is monopolizing it resulting in the dump failing? That would 
explain the sporadicness of it.

Thanks much...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  henson at csupomona.edu
California State Polytechnic University  |  Pomona CA 91768



More information about the Kerberos mailing list