kadmin incremental propagation full resync multiple processes spawned
Paul B. Henson
henson at acm.org
Thu Nov 3 17:27:57 EDT 2011
On 11/3/2011 11:14 AM, Greg Hudson wrote:
> Sorry about that; it's been a busy time.
Understood, I appreciate the reply :).
> It's definitely relevant. This implies the kdb5_util dump command
> failed. At that point there are several bugs:
[...]
> All of these bugs existed in 1.8 and 1.7 (the code hasn't changed); the
> new factor is that the dump is failing. Unfortunately, we kind of have
> to guess at why. The most obvious candidates are that (1) the path to
> kdb5_util is wrong, or (2) the path to the dump file is in a nonexistent
> directory.
It seems either of those issues would result in a full resync *never*
working? Which is not the case in this scenario, as after some number of
failures (one this time, about half a dozen last time) it does
eventually succeed:
Nov 2 03:52:56 halfy kadmind[20238]: Request: iprop_full_resync_1,
spawned resync process 20610,
client=kiprop/loogie.unx.csupomona.edu at CSUPOMONA.EDU,
service=kiprop/kerberos-master.csupomona.edu at CSUPOMONA.EDU,
addr=134.71.247.11
Nov 2 03:55:10 halfy kadmind[20238]: Request: iprop_get_updates_1,
UPDATE_OK; Incoming SerialNo=103785; Outgoing SerialNo=103790, success,
client=kiprop/loogie.unx.csupomona.edu at CSUPOMONA.EDU,
service=kiprop/kerberos-master.csupomona.edu at CSUPOMONA.EDU,
addr=134.71.247.11
and return to incremental propagation.
There is a dump file created:
-rw------- 1 root root 1 Nov 2 03:53
/var/lib/krb5kdc/slave_datatrans_loogie.unx.csupomona.edu.loogie.unx.csupomona.edu.last_prop
-rw------- 1 root root 45864490 Nov 2 03:53
/var/lib/krb5kdc/slave_datatrans_loogie.unx.csupomona.edu
-rw------- 1 root root 1 Nov 2 03:53
/var/lib/krb5kdc/slave_datatrans_loogie.unx.csupomona.edu.dump_ok
Looking back through the logs, there are instances where a full resync
is requested, and immediately succeeds with no failures:
Oct 16 04:09:08 halfy kadmind[19398]: Request: iprop_get_updates_1,
UPDATE_FULL_RESYNC_NEEDED; Incoming SerialNo=90237; Outgoing
SerialNo=N/A, success,
client=kiprop/loogie.unx.csupomona.edu at CSUPOMONA.EDU,
service=kiprop/kerberos-master.csupomona.edu at CSUPOMONA.EDU,
addr=134.71.247.11
Oct 16 04:09:08 halfy kadmind[19398]: Request: iprop_full_resync_1,
spawned resync process 30132,
client=kiprop/loogie.unx.csupomona.edu at CSUPOMONA.EDU,
service=kiprop/kerberos-master.csupomona.edu at CSUPOMONA.EDU,
addr=134.71.247.11
Oct 16 04:11:21 halfy kadmind[19398]: Request: iprop_get_updates_1,
UPDATE_NIL; Incoming SerialNo=94785; Outgoing SerialNo=N/A, success,
client=kiprop/loogie.unx.csupomona.edu at CSUPOMONA.EDU,
service=kiprop/kerberos-master.csupomona.edu at CSUPOMONA.EDU,
addr=134.71.247.11
So kdb5_util is failing sometimes <sigh>. Are any of these bugs
perchance on the shortlist for getting resolved :)? Presumably if the
kdb5_util failure details were logged it might be blindingly obvious
what's going on, or at least a step closer to resolution.
If none of the others, the child continuing on to serve requests seems
like potentially a severe bug. From my understanding, multiple processes
simultaneously manipulating a bdb database without the proper
precautions can result in corruption :(. Given kadmin probably assumes
it will be the only process ever writing it might not bother with the
overhead of multiprocess synchronization.
Hmm, another thought -- we originally switched to incremental
propagation because during periods of high load the full replication
would result in the database being locked and kadmind failing to execute
updates occasionally. Given that now the full replication is triggered
because a high update rate prevented incremental from working for too
long, could it be that kdb5_util is failing to lock the database because
kadmind is monopolizing it resulting in the dump failing? That would
explain the sporadicness of it.
Thanks much...
--
Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst | henson at csupomona.edu
California State Polytechnic University | Pomona CA 91768
More information about the Kerberos
mailing list