Extended timeout for new TCP connections

Wed Apr 10 09:48:52 EDT 2013

On Tue, 2013-04-09 at 16:25 -0500, Nico Williams wrote:
> On IRC I expressed concern that the value of 10 seconds is related to
> OTP requirements but being applied to all KDC requests
> indiscriminately.  I think this is both, inappropriate and
> undesirable.  My recommendation is that you make this value
> configurable for OTP and non-OTP requests.

This is incorrect. OTP needs only 3 seconds. The timeout of 10 seconds
was reached quite independent of OTP needs. When we identified that,
apart from OTP, libkrb5 will continue to indiscriminately contact other
servers while waiting for a TCP reply we observed this to be incorrect
behavior. The short timeout that exists now ONLY exists because UDP has
no guarantee of packet delivery or server presence. TCP provides both.
Because of this, it is not the correct behavior to be spamming other
servers. Such only increases network traffic and server load. There is
no positive benefit of this incorrect behavior.

Also, it is not right to provide configuration for incorrect behavior.

> On a related note, I think we ought to have an application-layer ACK
> with average processing time and standard deviation for the type of
> KDC-REQ that the client last sent.  This would be requested/enabled by
> a kdc-option.  This would make timeouts much more adaptive and would
> (if widely deployed) prevent clients from having timeouts that are too
> aggressively short (which are risky as they can cause a load spike to
> reinforce itself).  For example, if a KDC crashed immediately after
> accepting a TCP connection then if the client knew the KDC to send
> ACKs, or if the KDC had sent an ACK before crashing, then the client
> could timeout much faster than in 10 seconds without causing undue
> load on the next KDC.  This could be particularly useful for PKINIT,
> maybe future PKCROSS, and for KDCs that use LDAP too, as well as for
> OTP (where the KDC could even send a second ACK indicating that
> because of token state synchronization the request will require an
> extra N seconds).

I agree this would be useful, particularly in the case of UDP. But that
doesn't change the fact that our current behavior is incorrect.

> In the meantime, I agree that at least a TCP socket's being connected
> is a much better indication of likely success than we get with UDP
> (where we get none).

+1

Nathaniel