KDC query client performance

Mon Feb 14 13:14:27 EST 2011

On Feb 14, 2011, at 9:27 AM, krbdev-request at mit.edu wrote:

> Message: 1
> Date: Sun, 13 Feb 2011 16:52:24 -0500 (EST)
> From: ghudson at MIT.EDU
> Subject: KDC query client performance
> To: krbdev at MIT.EDU
> Message-ID: <201102132152.p1DLqOUY011067 at outgoing.mit.edu>
> 
> We've been looking into some cases where MIT krb5 imposes unreasonable
> performance penalties on scenarios where krb5 doesn't even wind up
> getting used.  For instance, in one scenario, turning on ssh's
> GSSAPIKeyExchange feature caused 96 DNS requests and 12 KDC requests
> to conclude that there was no krb5 support for a target host on a
> local network, for a delay of about four seconds.
> 
> As a first step, I've restructured the locate/sendto code so that we
> don't resolve hostnames until we need them.  (I haven't yet extended
> the KDC location module to be able to take advantage of this support.)
> 
> Some other steps we'd like to consider:
> 
> 1. Turn off the realm walk on the client by default.  This is the
> logic where the client assumes that (a) cross-realm key sharing is
> most likely to be arranged along the domain hierarchy of realms, and
> (b) the local KDC is only smart enough to return a cross-tgt for the
> realm we ask for, not for an intermediate realm.  The second
> assumption is no longer likely to be true; for quite a long time now,
> KDCs have been smart enough to perform the realm walk internally and
> respond with a TGT referral.  The down side of the realm walk is that
> we commonly make three or more KDC queries to determine that a guessed
> target realm doesn't exist within the local realm's federation.
> 
> It would actually be nice to eliminate this support entirely, as it's
> a big source of complexity in the TGS request code.  But a more
> conservative first step is to turn it off and allow it to be turned
> back on.

Agree with the eventual goal.  Maybe it's just me, but I'm not yet comfortable with depending on referrals instead of the traditional realm/domain-walk.  Wouldn't want it turned off by default until Solaris 10 clients support referral tickets by default (which I haven't checked).

> 2. Speeding up the client retry loop, so that it doesn't take as long
> to time out when you're behind a firewall which black-holes port 88.
> Currently we wait one second per UDP address per pass (and per TCP
> address on the first pass), and also wait 2s/4s/8s/16s (or 30s in
> total) at the end of each pass.
> 
> In order to be nice to KDC load, I think it's still prudent to wait
> one second per server address on the first pass.  After that we're
> mostly trying to be nice to the network, and networks have gotten much
> faster.  So I think once we reach the end of the first pass, we ought
> to speed everything up by a factor of ten--that is, wait only 100ms
> between UDP queries on the second and later passes, and wait
> 200ms/400ms/800ms/1600ms at the end of passes.

Personally, I'd rather you just eliminate the last pass (or two?).  I think what's important is that you try all the possibilities, you try them more than once, and you don't shorten the 1-second response-time requirement.  Beyond that it's kind of a matter of opinion.

> 3. Eliminate the second default UDP port (750) when parsing profile
> kdc entries.  When a KDC is inaccessible, this causes extra delays,
> and also extra DNS requests due to the way the code is structured.  We
> have always restricted the second default port to UDP over IPv4,
> likely because it was intended as a krb4 transition measure.
> 
> Unfortunately, this change is likely to break a handful of deployments
> which happen to serve KDC requests only on port 750 and win because
> they only need it to work over IPv4 UDP (and don't have any Heimdal
> clients, or configure their Heimdal clients to use port 750
> explicitly).  I'm not sure if it's worth not breaking these
> environments at the cost of extra delays in more common cases.

I hope I don't wind up regretting this, given our AFS stuff, but I think this is a good idea.  Port 750 should go the way of Kerb 4.

Only time I've run into a related problem was with a firewall that allowed outbound TCP (to my KDC), but not UDP.  Caused some weird failures since TCP wasn't always tried, which the user (another NASA center) fixed by fixing the firewall.  It had nothing to to with port numbers per se, but that affected the retry logic.  Sorry I don't remember the client platform/version.

------------------------------------------------------
The opinions expressed in this message are mine,
not those of Caltech, JPL, NASA, or the US Government.
Henry.B.Hotz at jpl.nasa.gov, or hbhotz at oxy.edu