krb5-1.12 - New iprop features - tree propagation & slave notify

Wed Jan 15 07:19:45 EST 2014

I just figured I would post an update since I managed to see slaves lagging in updates under high load (and wait the iprop interval). Currently, I post the initial notification, but it appears high-change environments still need more.

My ideal would be to leave the notification async but if the notification fails, implement a parametized retry count before dropping the client.

Personally, I am probably going to increase my ulog size since the 2500 restriction is gone (the kdc.conf man page was not updated to reflect such).

If anyone sees any logic flaws in my code which explains the behavior, feel free to comment on such. It is possible I overlooked something, but it didn't jump out at me (but I figured people should be aware of the issue albeit one which is no worse off than the current code base).

----- Original Message -----
From: Nico Williams [mailto:nico at cryptonector.com]
Sent: Tuesday, January 14, 2014 10:57 PM Eastern Standard Time
To: Richard Basch <basch at alum.mit.edu>
Cc: krbdev at mit.edu <krbdev at mit.edu>; Greg Hudson <ghudson at mit.edu>; Tom Yu <tlyu at mit.edu>; Basch, Richard [Tech]
Subject: Re: krb5-1.12 - New iprop features - tree propagation & slave notify

On Tue, Jan 14, 2014 at 8:03 PM, Richard Basch <basch at alum.mit.edu> wrote:
> Overall, as long as the rate of change is not too great, I have not observed
> latency issues, nor do I see any obvious bugs, except that the notification
> is not synchronous and any failure there may result in the notification not
> being processed and the client not being queued to be notified for another
> update.

Not synchronous... relative to the response to the kadm5 client?
Indeed, no one will want that :)  Async is fine, and it's what we've
always expected.  (Which is why it's important to write new keys to
keytabs before writing them to the KDB.)

Or if you meant async as in kadmind not waiting for a slave to fetch
its updates before moving on to the next slave (or next update),
that's also fine, and very much desirable: no slave should be able to
stop replication (unless it's part of the replication hierarchy, in
which case it can stop replication downstream of itself).

> In other words, what I provided really is an immense improvement over the
> prior replication (latency is generally reduced), but there is still an edge
> case under load where it might fall back to prior behavior. I might change
> my implementation slightly so that instead of dropping the client after the
> first notification to implement a countdown for number of notifications
> before the client is dropped from the list. This will likely reduce the
> likelihood of the bug. I do not think slave notification should be
> synchronous.

Dropping slaves after some number of updates without hearing from them
sounds good to me.

Nico
--