Rate limiting Kerberos Requests

Nico Williams nico at cryptonector.com
Wed Sep 26 17:04:25 EDT 2012


On Wed, Sep 26, 2012 at 1:25 PM, Jack Neely <jjneely at ncsu.edu> wrote:
> After spending some quality time with my logs, I do about 1.3 million
> kerberos requests a day or 960/min on average.  The incident that took
> out the kerberos servers with an additional 600 hits/min (from the krb
> logs) doesn't even make a spike on my graphs.  My late morning usage is
> higher.

I'm not sure I understood correctly what the incident's symptoms were.
 If the symptom was non-responsiveness for a second then it's very
likely (very, very, very likely) that the patches I mentioned earlier
will solve your problem, and the events to correlate the incident to
would be kadmind / kadmin.local / kdb5_util load / kpropd iprop events
-- the longer these events the more likely that the kdc ends up
sleeping for a second at a time.

The bug -if I'm right that it is the bug affecting you- is that
between 1.5 and 1.10, inclusive, all versions of MIT krb5 used
non-blocking file locking with a three-re-try loop with a 1-second
sleep each go around.  This is disastrous, really, but it only bites
when something holds an exclusive lock on the KDB, which would be the
daemons/tools listed above, and since the amount of time spent holding
an exclusive lock on the KDB is generally (always, if you don't use
the kadmin.local lock command) short, you might well be getting lucky
99.99% of the time and thus not observing any 1- or more second
outages on your KDCs.

If multiple KDCs are affected at roughly the same time then I'd
suspect iprop.  What is the rate of write transactions on your master?
 Do the rates of KDC (read) vs. kadm5srv (write) transactions imply
the rate of outages you're experiencing?

Nico
--


More information about the Kerberos mailing list