Gss context refresh failure due to clock skew
William.Adamson at netapp.com
Mon Oct 5 15:35:39 EDT 2015
> On Oct 5, 2015, at 3:10 PM, Greg Hudson <ghudson at MIT.EDU> wrote:
> Sorry for the delay; Andy's mail got stuck in the krbdev moderation
> queue by mistake.
> On 10/01/2015 05:30 PM, Adamson, Andy wrote:
>> The situation occurs as follows.
> I am a little bit confused by this description because of terminology
> issues. In your description, you appear to use the phrase "TGS" to
> refer to service tickets (i.e. tickets whose service principal is
> nfs/server.name), but I can't be sure. The actual meaning of "TGS" is
> "ticket-granting service," i.e. the KDC service whose principal name is
Pardon my terminology gaff. I mean a ticket for nfs/server.name.
>> 2) For convenience, I set the TGS lifetimes to be as short as possible, 10 minutes for Win2008R2 AD which I test with.
> Are you setting the maximum lifetime for nfs/server.name tickets to 10
> minutes, but still allowing ticket-granting tickets to have a lifetime
> of multiple hours?
[root at rhel6-7ga sles-kernel]# klist -ce /tmp/krb5cc_machine_ANDROSAD.FAKE
Ticket cache: FILE:/tmp/krb5cc_machine_ANDROSAD.FAKE
Default principal: nfs/rhel6-7ga.androsad.fake at ANDROSAD.FAKE
Valid starting Expires Service principal
09/30/15 11:57:02 09/30/15 12:57:02 krbtgt/ANDROSAD.FAKE at ANDROSAD.FAKE
renew until 10/07/15 11:57:02, Etype (skey, tkt): aes256-cts-hmac-sha1-96, aes256-cts-hmac-sha1-96
09/30/15 11:57:02 09/30/15 12:07:02 nfs/rhel7-1ga-4.androsad.fake at ANDROSAD.FAKE
renew until 10/07/15 11:57:02, Etype (skey, tkt): arcfour-hmac, arcfour-hmac
>>> 12) Wait until the client clock is past the server TGS expiry time
>>> 13) re-try the mkdir - it succeeds after a successful GSS INIT NULL call exchange for both servers.
> If I understand correctly, this request succeeds because
> krb5_get_credentials() ignores the expired cached service ticket and
> makes a TGS request for a new service ticket. The cache now contains:
> * A ticket for krbtgt/REALM with hours remaining
> * A ticket for nfs/server.name which expired recently
> * Another ticket for nfs/server.name which expires in ten minutes
> Is that correct?
Yes, and the new service ticket produces an RPCSEC_GSS_INIT token that has an expiry that passes the servers clock test.
>> Shouldn’t these refresh calls succeed? Isn’t the Kerberos clock skew supposed to handle this situation?
> I think this case doesn't arise often because people don't often set
> maximum service ticket lifetimes to be shorter than maximum TGT
Not the cause of the issue. The service ticket lifetime of 10 minutes is just there for testing this issue as I needed to wait until the service ticket had ‘expired’ on the server - but not yet on the client.
We see this issue all the time in NetApp QA as we run mutiple day heavy IO tests against a kerberos mount. If the server clock is ahead of the client clock, permission denied errors stop the test as the first service ticket “expires” on the server but not on the client.
> If the TGT itself has expired or is about to expire, some
> out-of-band agent needs to refresh the TGT somehow, and it doesn't
> matter all that much whether the failure comes from the client or the
I thought that having a keytab entry and a renewable TGT was enough.
> That said, your scenario should work, and it doesn't. The primary cause
> is an explicit check added to the krb5 mech's gss_accept_sec_context()
> implementation in 1996 (before the MIT krb5 1.0 release), which checks
> the ticket endtime with no allowance for clock skew. I don't know
> precisely why the check was added, but my guess it is for the
> computation of the context validity lifetime; it would make no sense to
> tell the application "the authentication succeeded and the resulting
> context is valid for the next -3 minutes.”
That also makes no sense - simply use the kerberos clock skew in the message. e.g. if the clock skew is 5 minutes, and if according to the server clock the ticket has been expired for 2 minutes, then the message becomes "the authentication succeeded and the resulting context is valid for the next 3 minutes.” as there are 3 minutes left in the server clock time cavat the configured kerberos clock skew.
> Perhaps a better choice would be to remove this check, and instead add
> the clock skew to the validity lifetime of GSS krb5 acceptor contexts.
Yes. That is my opinion.
More information about the krbdev