Gss context refresh failure due to clock skew
kaduk at MIT.EDU
Mon Oct 5 20:02:11 EDT 2015
On Mon, 5 Oct 2015, Adamson, Andy wrote:
> > On Oct 5, 2015, at 4:02 PM, Greg Hudson <ghudson at MIT.EDU> wrote:
> > On 10/05/2015 03:35 PM, Adamson, Andy wrote:
> >>> I think this case doesn't arise often because people don't often set
> >>> maximum service ticket lifetimes to be shorter than maximum TGT
> >>> lifetimes.
> >> Not the cause of the issue. The service ticket lifetime of 10 minutes is just there for testing this issue as I needed to wait until the service ticket had ‘expired’ on the server - but not yet on the client.
> >> We see this issue all the time in NetApp QA as we run mutiple day heavy IO tests against a kerberos mount. If the server clock is ahead of the client clock, permission denied errors stop the test as the first service ticket “expires” on the server but not on the client.
> > If the issue is not caused by short-lifetime service principals,
> I was wrong - you are right, it is caused by service ticket lifetimes being shorter than TGT lifetimes.
> I didn’t know setting the service ticket lifetimes to not be less than
> TGT lifetimes was a requirement. Neither does NetApp QA and I suspect,
> neither do customers in general.
It's not a requirement. (Greg explicitly said "That said, your scenario
should work, and it doesn't." in his first message.)
> > then
> > the test scenario you described isn't representative of the real
> > scenario. To reproduce the problem as it manifests in your IO tests,
> > you will need to adjust the TGT lifetime down to ten minutes as well as
> > the nfs/server lifetime.
> Code was added to rpc.gssd, the NFS client agent that creates GSS
> contexts for NFS, to take into account the clock skew and get a new TGT
> before (now+clock skew). So if the service ticket lifetime is equal to
> or greater than the TGT lifetime, then all is well.
> >>> If the TGT itself has expired or is about to expire, some
> >>> out-of-band agent needs to refresh the TGT somehow, and it doesn't
> >>> matter all that much whether the failure comes from the client or the
> >>> server.
> >> I thought that having a keytab entry and a renewable TGT was enough.
> > I'm not sure why you would do both of these; if you're getting initial
> > creds with a keytab, there is no need to muck around with ticket renewal.
> I wouldn’t, but QA and customers do.
> > Anyway, gss_init_sec_context() never renews tickets, and only gets
> > tickets from a keytab when a client keytab is configured (new in 1.11).
> > When tickets are obtained using a client keytab, they are refreshed
> > from the keytab when they are halfway to expiring,
> refreshed by…?
The GSS library itself.
give a little bit of intro, though this feature could benefit from better
> > so this clock skew
> > issue should not arise, so I don't think that feature is being used.
> > It is possible that the NFS client code has its own separate logic for
> > obtaining new tickets using a keytab.
> When an NFS request requires a GSS context, if the context does not
> exist, is not valid, or if it is valid but the server replies to an RPC
> request using a GSS context with an RPC error that indicates it’s side
> of the GSS context has a problem, the client kernel does an upcall to
> rpc.gssd which then decides if a new service ticket is required to send
> an RPCSEC_GSS_INIT message to the server to create a new GSS context.
> The resultant GSS context is stored in the client kernel with a lifetime
> equal to the service ticket used to create it.
> If rpc.gssd calls the code that refreshes the tickets from the keytab
> when they are half way to expiring’ then that should mitigate the clock
> skew issue.
> > If so, we need to understand how
> > it works. It's possible (though unlikely) that changing the behavior of
> > gss_accept_sec_context() wouldn't be sufficient by itself.
> krbdev mailing list krbdev at mit.edu
More information about the krbdev