Rate limiting Kerberos Requests
Jack Neely
jjneely at ncsu.edu
Wed Sep 26 14:25:18 EDT 2012
On Tue, Sep 25, 2012 at 02:08:29PM -0700, Russ Allbery wrote:
> Jack Neely <jjneely at ncsu.edu> writes:
>
> > Thanks for reading between the lines. I don't have evidence that my
> > KDCs were overloaded, yet I got quite a few cannot reach KDC errors and
> > a logins stopped working everywhere.
>
> > The slaves are HP G7 blades with 12GB of RAM and a 6 core Intel Xeon. 2
> > servers in one DC and the other slave (and master) in the other DC.
> > Each DC has its own firewall/vlan for the kerberos servers. RHEL 5
> > running kerb 1.6.1.
>
> > My network engineers tell me that the firewall in one DC had 8000
> > concurrent connections from the offending IP address to the KDCs and
> > 4000 in the second DC. (Oddly, the DC with only 1 slave.) The KDCs
> > weren't able to handle other requests until the spike settled.
>
> Is it possible that, rather than overwhelming the KDC, you instead
> overwhelmed the UDP session table on your firewall? Sometimes firewalls
> have surprisingly small UDP session tables, which can cause serious
> problems for Kerberos and for DNS servers.
>
> You're right that there are ill-behaved Kerberos applications that will
> spam authentication requests, but I tend to think of this similar to the
> problem with DNS, where there are ill-behaved resolvers that do the same
> thing. Fixing them tends to be really hard, but answering Kerberos
> requests should normally be extremely fast. It's usually easier to just
> ensure you can handle the load spikes than worry too much about fixing all
> the broken clients. (Of course, the rate limiting path that you're going
> down is one way to do that.)
>
> We were quite concerned when we first looked at putting Kerberos KDCs
> behind a hardware firewall because of that session limit. Our firewalls
> have a 100,000 UDP session limit and a fairly quick timeout. One tuning
> that you can do on the hardware firewall, if that is the problem, is to
> reduce the UDP session length for Kerberos KDC traffic. You're either
> going to get a reply and complete the transaction in under a minute (in
> practice, under 10 seconds) or it's never going to work anyway, so if, for
> example, your firewall is trying to remember sessions for an hour, you're
> just wasting memory and possibly DoSing your firewall.
After spending some quality time with my logs, I do about 1.3 million
kerberos requests a day or 960/min on average. The incident that took
out the kerberos servers with an additional 600 hits/min (from the krb
logs) doesn't even make a spike on my graphs. My late morning usage is
higher.
So there's another piece to the puzzle.
Jack
--
Jack Neely <jjneely at ncsu.edu>
Linux Czar, OIT Campus Linux Services
Office of Information Technology, NC State University
GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89
More information about the Kerberos
mailing list