krb5.conf implementation question

Fri Apr 10 22:10:55 EDT 2020

On 4/10/20 5:40 PM, O'Loughlin, Kieran wrote:
> I'm not developing an application using the MIT Kerberos libraries, but am implementing a third-party application that uses the libraries.  So, I hope it's ok to be emailing this list.

kerberos at mit.edu would be more appropriate, but it's not a big deal.

> The problem happens when the first KDC in the list is rebooted.  There is a short window where the KDC will respond with KDC_ERR_C_PRINCIPAL_UNKNOWN or KDC_ERR_S_PRINCIPAL_UNKNOWN (probably depending on type of request) as it is shutting down.  Microsoft has an article about this type of behavior for 2008 SP2 https://support.microsoft.com/en-us/help/982801/a-domain-controller-returns-the-no-such-user-0xc0000064-status-code-or, but we're seeing the same symptoms with 2008 R2, 2012 R2 and 2019.

That's interesting; I haven't heard of this particular issue with
Microsoft's KDCs before, and it sounds like Microsoft thinks they fixed
the problem ten years ago.  Unfortunately I don't think you'll be able
to resolve the issue on the MIT client side without code changes.

> When we enable KRB5_TRACE we see up to 4 request attempt being made.  I don't know if the multiple tries are initiated by the MIT code or by the application code.

Based on the trace logs, I think the second try is initiated by the MIT
code, and the other pair is the result of a second attempt to get
credentials by the application.

The second try in the MIT code is a fallback in case the KDC doesn't
support referrals.

>   *   If the MIT code is making the 4 request attempts, is there any way (krb5.conf configuration, env variables, etc.) that we could force each retry to use a different KDC entry in the krb5.conf.  This would move the retries away from the server that is rebooting and the first of those should get a good response.

We only walk down the KDC list when the first KDC fails to respond.

>   *   As mentioned above we are listing out the individual KDC machines, would it be better to set up the krb5.conf in a different way, perhaps using DNS to find the KDCs?

I don't think that would help reliably.  Randomization of DNS response
order could make the right thing happen some of the time, but not all of
the time.

>   *   I saw a mention in one email about setting master_kdc, that suggested if there is an error a subsequent request might be sent to the master_kdc.  However the documentation says this only happens on an invalid password.  Is that the case or is it worth setting master_kdc?  We don't do that currently.

master_kdc currently only applies to AS requests, and this is a TGS
request, so that wouldn't help.  Also, it obviously wouldn't help the
KDC listed in master_kdc was the one shutting down.