Ticket 5338: Race conditions in key rotation

Wed Jun 25 13:06:53 EDT 2008

Jeffrey Hutzelman wrote:

>> 	1.  the client libraries versions are in sync with
>> 	    the KDC version, or
>>
>> 	2.  that this decision should be made by the KDC
>> 	    infrastructure.
>>
>> So, as a client here is your flow:
>>
>> 	1.  you got a TGT with kvno 7 from a slave,
>>
>> 	2.  there was mutual auth involved---so you know that you
>> 	    got it from a _real_ slave,
>>
>> 	3.  you present it to another slave and it does not work,
>>
>> 	4.  now, you _know_ that there exists a slave for which this
>> 	    request would actually work,
> 
> 
> No, you don't.  The service principal may not exist.  It may be disabled. 
> There may be no mutually-supported enctype.  _Your_ principal may be 
> expired.  Or any of a number of other things might be wrong, which the KDC 
> may tell you about with an appropriate error code.

Slave KDC 1 from which I received a TGT with kvno N certainly knows the
key associated with kvno N.

If I went back to Slave KDC 1 and presented the TGT as part of a request
it is not going to come back and tell me KRB5_GENERIC see error text
to then find "key version number not found".

> "No, you are not allowed to have a ticket" means you are not allowed to 
> have a ticket, not "go ask mommy if she will give you one".

The worst part about this situation is that the client really has no 
idea why the ticket wasn't issued.  The client is not getting an error
that explains the reason at all.  It is getting a KRB5_GENERIC error.

> The issue here is that you are proposing that the answer to a set of KDC's 
> not presenting a consistent view of the realm is not to fix the KDC's, but 
> to require clients to query multiple KDC's and compare their responses, 
> which is not what the protocol specification says.

The protocol specification doesn't say anything about the subject of 
master/slave, primary/secondary, or multi-master distributed KDC 
implementations.   It doesn't say that clients MUST, SHOULD, SHOULD NOT, 
or MUST NOT contact the Master/Primary KDC after a client request fails.

The only reference to "fatal" errors are in conjunction with the TCP
support which states that a client MUST NOT treat a connection break
as a fatal.   The only references to "retry" are related to pre-auth 
methods that the client SHOULD retry and KRB_ERR_RESPONSE_TOO_BIG that
is a hint to the client to retry but doesn't say so explicitly.

In fact, in section 3.1.6 discussion what a client should do in response 
to a KRB_ERROR message, it states that the client interprets it as an 
error and performs "whatever application-specific tasks are necessary 
for recovery."   Perhaps you are interpreting to mean that the krb5 
library must pass the error to the calling application and it is the 
application's responsibility to decide whether or not to retry. 
However, that is not how I interpret this.  The library is part of the 
application and its job is to handle Kerberos errors internally so that 
the application can succeed at the task it is attempting to accomplish.
If that means retrying, so be it.

The KDC deployment architecture selected by an implementation is an
implementation specific detail that is independent of the protocol
specification.  Inconsistencies in the deployment of the implementation 
specific database are addressed in an implementation specific manner. 
The MIT clients already do contact the Master/Primary KDC if one is 
defined for the realm when the client's AS request fails for any reason. 
  This is done explicitly because of the fact that Slave/Secondary KDCs 
might not have an up to date view of the world.

This definition is specified either via the use of the "master_kdc" 
entry in the realm section of the krb5.conf (the profile) or in DNS SRV 
records.

Nico has commented to say that fail over for AS requests are ok and TGS 
requests are not because the volume of AS requests is lower than that 
for TGS requests.  In a properly configured realm, the number of 
failures are going to be small compared to the overall volume of 
requests to the slaves.  This assumption has been verified in real world 
production realms.  If you have data that disputes this, please provide it.

The problem here is that no matter what you say a KDC based solution 
should be, there is no transactional mechanism that you can put in place 
on an error prone network that will result in all copies of the database 
being consistent all of the time.   By adding additional transactional 
complexity you can reduce the opportunities for inconsistency but you do 
so at the cost of the increase in complexity and the expose to failure 
that the complexity introduces.

I certainly do not believe that as a practical matter a client should 
ask all of known KDCs in turn for an answer if it doesn't get a 
successful response.  However, turning to the known authoritative 
database to answer the question is perfectly reasonable when it is known 
that such an authoritative database exists.

Jeffrey Altman
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3355 bytes
Desc: S/MIME Cryptographic Signature
Url : http://mailman.mit.edu/pipermail/krbdev/attachments/20080625/eb5459e4/attachment.bin