Concurrency issues with FILE ccache
Osipov, Michael (LDA IT PLM)
michael.osipov at siemens.com
Tue Apr 6 11:48:37 EDT 2021
Hi,
we do experience some weird concurrency issues with FILE: based
credential caches.
One Python application uses tens (mostly 16 to 24) of concurrent threads
to access resources via py-requests and py-requests-gssapi on top of
Debian 10 with MIT Kerberos 1.17 (GitLab Runner) and FreeBSD 12-STABLE
with MIT Kerberos 1.19.1 (my dev box). GSS context is maintained per
thread/request rather than using Request's Session object.
The initiator is a service keytab from KRB5_CLIENT_KTNAME, for testing
purposes I do use kinit with my personal AD account, but the outcome is
the same. On both platforms I see output from klist like this:
> Ticketzwischenspeicher: FILE:/tmp/krb5cc_722
> Standard-Principal: osipovmi at EXAMPLE.COM
>
> Valid starting Expires Service principal
> 06.04.2021 17:18:38 07.04.2021 03:18:38 krbtgt/EXAMPLE.COM at EXAMPLE.COM
> erneuern bis 07.04.2021 17:18:35
> 06.04.2021 17:18:42 07.04.2021 03:18:38 HTTP/hostname.EXAMPLE.COM at EXAMPLE.COM
> 06.04.2021 17:18:42 07.04.2021 03:18:38 HTTP/hostname.EXAMPLE.COM at EXAMPLE.COM
> 06.04.2021 17:18:42 07.04.2021 03:18:38 HTTP/hostname.EXAMPLE.COM at EXAMPLE.COM
> 06.04.2021 17:18:42 07.04.2021 03:18:38 HTTP/hostname.EXAMPLE.COM at EXAMPLE.COM
> 06.04.2021 17:18:42 07.04.2021 03:18:38 HTTP/hostname.EXAMPLE.COM at EXAMPLE.COM
On Debian even output like:
> gssapi.raw.misc.GSSError: Major (851968): Unspecified GSS failure. Minor code may provide more information, Minor (100001): Failed to store credentials: Internal credentials cache error (filename: /tmp/krb5cc_1000)
Which leads me to the fact that TGT and service ticket acquisition or
the read/write to the FILE cache is not R/W locked.
For testing purposes I have modified py-requests-gssapi in such a way
that threading.Lock() is used around the first SecurityContext step call
because here is the cache set up and I do see in klist output only *one*
service ticket and the rest goes on smoothly.
I have also considered to use
> gssapi.raw.acquire_cred_from(store={b"ccache":b"FILE:/tmp/<app>_<threadname>", b"client_keytab":b"/path/to/service.keytab"}, usage='initiate')
but I'd like to avoid adding even more code to fix a symptom not the cause.
I have also compared KRB5_TRACE output from the original and patched
version of py-requests-gssapi and one can clearly see that in the former
-- due to race conditions .. every thread tries to retrieve a TGT and a
service ticket while the latter (patched) does it only once.
What is the general advise here? Is any of the caches threadsafe because
none is documented either way?
Regards,
Michael
More information about the Kerberos
mailing list