Concurrency issues with FILE ccache

Osipov, Michael (LDA IT PLM) michael.osipov at siemens.com
Tue Apr 6 11:48:37 EDT 2021


Hi,

we do experience some weird concurrency issues with FILE: based 
credential caches.
One Python application uses tens (mostly 16 to 24) of concurrent threads 
to access resources via py-requests and py-requests-gssapi on top of 
Debian 10 with MIT Kerberos 1.17 (GitLab Runner) and FreeBSD 12-STABLE 
with MIT Kerberos 1.19.1 (my dev box). GSS context is maintained per 
thread/request rather than using Request's Session object.
The initiator is a service keytab from KRB5_CLIENT_KTNAME, for testing 
purposes I do use kinit with my personal AD account, but the outcome is 
the same. On both platforms I see output from klist like this:
> Ticketzwischenspeicher: FILE:/tmp/krb5cc_722
> Standard-Principal: osipovmi at EXAMPLE.COM
> 
> Valid starting       Expires              Service principal
> 06.04.2021 17:18:38  07.04.2021 03:18:38  krbtgt/EXAMPLE.COM at EXAMPLE.COM
>         erneuern bis 07.04.2021 17:18:35
> 06.04.2021 17:18:42  07.04.2021 03:18:38  HTTP/hostname.EXAMPLE.COM at EXAMPLE.COM
> 06.04.2021 17:18:42  07.04.2021 03:18:38  HTTP/hostname.EXAMPLE.COM at EXAMPLE.COM
> 06.04.2021 17:18:42  07.04.2021 03:18:38  HTTP/hostname.EXAMPLE.COM at EXAMPLE.COM
> 06.04.2021 17:18:42  07.04.2021 03:18:38  HTTP/hostname.EXAMPLE.COM at EXAMPLE.COM
> 06.04.2021 17:18:42  07.04.2021 03:18:38  HTTP/hostname.EXAMPLE.COM at EXAMPLE.COM

On Debian even output like:

> gssapi.raw.misc.GSSError: Major (851968): Unspecified GSS failure.  Minor code may provide more information, Minor (100001): Failed to store credentials: Internal credentials cache error (filename: /tmp/krb5cc_1000)

Which leads me to the fact that TGT and service ticket acquisition or 
the read/write to the FILE cache is not R/W locked.
For testing purposes I have modified py-requests-gssapi in such a way 
that threading.Lock() is used around the first SecurityContext step call 
because here is the cache set up and I do see in klist output only *one* 
service ticket and the rest goes on smoothly.
I have also considered to use
> gssapi.raw.acquire_cred_from(store={b"ccache":b"FILE:/tmp/<app>_<threadname>", b"client_keytab":b"/path/to/service.keytab"}, usage='initiate')
but I'd like to avoid adding even more code to fix a symptom not the cause.

I have also compared KRB5_TRACE output from the original and patched 
version of py-requests-gssapi and one can clearly see that in the former 
-- due to race conditions .. every thread tries to retrieve a TGT and a 
service ticket while the latter (patched) does it only once.

What is the general advise here? Is any of the caches threadsafe because 
none is documented either way?

Regards,

Michael


More information about the Kerberos mailing list