kadmin incremental propagation full resync multiple processes spawned

Fri Nov 4 18:39:41 EDT 2011

On 11/3/2011 8:26 PM, Greg Hudson wrote:
> On 11/03/2011 05:27 PM, Paul B. Henson wrote:
>> could it be that kdb5_util is failing to lock the database because
>> kadmind is monopolizing it resulting in the dump failing?
>
> That's a good theory.

So I wouldn't be having this problem if the propagation wasn't failing 
back to a full resync...

As I understand it, when a client requests incremental updates, if there 
have been any changes made on the server within the last 10 seconds, it 
tells the client to try again later. If a client requests incremental 
updates, and its current serial number predates the update log, it's 
told to do a full resync.

About three times a quarter we do some batch processing that runs for a 
couple of hours expiring or deleting principals. Generally that is when 
it fails back to a full resync. We create new accounts nightly, I 
suppose sometimes that might also take a while if there are a lot.

It seems the current mechanism is somewhat deficient on a busy server. 
Hypothetically, if you had such a busy server that some update always 
happened within a 10 second window, you would *never* get to do 
incremental propagation, it will always fall back to a full resync on an 
ongoing basis.  In my case, it would be nice if it could still do some 
occasional incremental propagation during our batch jobs so the slaves 
would keep up with changes and not trigger a full resync.

I guess the reason for the implementation is to minimize the number of 
incremental transfers? Is that really an issue? What if in addition to 
the busy timeout (which ideally could be a configurable option) there 
could also be a "do it anyways if it has been X seconds since the last 
transfer"? I'd probably set that to 5-10 minutes to prevent the slaves 
from being too stale, and to make sure incremental continues to work in 
the face of a high update rate. Or possibly a "do it anyways if the 
current slave serial number is X close to exceeding the update log"? The 
latter might actually be more reliable in terms of avoiding full resyncs.

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  henson at csupomona.edu
California State Polytechnic University  |  Pomona CA 91768