Rate limiting Kerberos Requests

Jack Neely jjneely at ncsu.edu
Thu Sep 27 11:17:45 EDT 2012


On Wed, Sep 26, 2012 at 04:04:25PM -0500, Nico Williams wrote:
> On Wed, Sep 26, 2012 at 1:25 PM, Jack Neely <jjneely at ncsu.edu> wrote:
> > After spending some quality time with my logs, I do about 1.3 million
> > kerberos requests a day or 960/min on average.  The incident that took
> > out the kerberos servers with an additional 600 hits/min (from the krb
> > logs) doesn't even make a spike on my graphs.  My late morning usage is
> > higher.
> 
> I'm not sure I understood correctly what the incident's symptoms were.
>  If the symptom was non-responsiveness for a second then it's very
> likely (very, very, very likely) that the patches I mentioned earlier
> will solve your problem, and the events to correlate the incident to
> would be kadmind / kadmin.local / kdb5_util load / kpropd iprop events
> -- the longer these events the more likely that the kdc ends up
> sleeping for a second at a time.

This definitely seems to explain the lag in responses I've noticed
during a kprop operation.  Usually I get a response in under a second,
but if I hit my KDC during when its receiving a kprop it can be 4 or 5
seconds.

The above incident is a single misbehaving client suddenly doing about
600 requests / minute for around 30 minutes.  During this window no one
else could get a KDC response before the client timed out.

I've also noticed that the 1.6.1 version in RHEL 5 is leaking memory.  I
think I've found my smoking gun here.  Large memory consumption is
directly related to slower performance in my testing.

Thanks a bunch for the pointer to the patch!

Jack Neely
 
> The bug -if I'm right that it is the bug affecting you- is that
> between 1.5 and 1.10, inclusive, all versions of MIT krb5 used
> non-blocking file locking with a three-re-try loop with a 1-second
> sleep each go around.  This is disastrous, really, but it only bites
> when something holds an exclusive lock on the KDB, which would be the
> daemons/tools listed above, and since the amount of time spent holding
> an exclusive lock on the KDB is generally (always, if you don't use
> the kadmin.local lock command) short, you might well be getting lucky
> 99.99% of the time and thus not observing any 1- or more second
> outages on your KDCs.
> 
> If multiple KDCs are affected at roughly the same time then I'd
> suspect iprop.  What is the rate of write transactions on your master?
>  Do the rates of KDC (read) vs. kadm5srv (write) transactions imply
> the rate of outages you're experiencing?
> 
> Nico
> --

-- 
Jack Neely <jjneely at ncsu.edu>
Linux Czar, OIT Campus Linux Services
Office of Information Technology, NC State University
GPG Fingerprint: 1917 5AC1 E828 9337 7AA4  EA6B 213B 765F 3B6A 5B89


More information about the Kerberos mailing list