Concurrency issues with FILE ccache

Osipov, Michael (LDA IT PLM) michael.osipov at siemens.com
Fri Apr 16 07:39:03 EDT 2021


Am 2021-04-09 um 20:24 schrieb Greg Hudson:
> On 4/9/21 11:35 AM, Osipov, Michael (LDA IT PLM) wrote:
>> I am quite sure that this is a race condition where stat() is performed,
>> file does not exist, open() with write is performed, in parallel it is
>> already created and the later call returns in EEXIST.
> 
> I agree, except I think it's just unlink() and open(O_CREAT|O_EXCL)
> calls with no stat().  I had erroneously assumed that the unexpected
> error was happening inside fcc_store() because of "Failed to store
> credentials" in the message, but that string turns out to be from
> get_in_tkt.c in a block of code that also calls krb5_cc_initialize().
> 
> The fcc_initialize() EEXIST self-race has existed since 1.0.  I'd
> speculate that the original developers' assumption was that lots of
> processes might be competing to use a file ccache, but that creating
> ccaches would be a rare and one-at-a-time affair (happening at login or
> when a user runs "kinit").  With client keytab support, that is no
> longer the case; it's easy to have multiple threads or processes
> competing to create or refresh a cache as part of gss_acquire_cred() or
> gss_init_sec_context().
> 
> Just fixing the fcc_initialize() race wouldn't really solve the problem;
> there would still be a window between krb5_cc_initialize() and
> krb5_cc_store_cred() where other threads (or processes) would see an
> initialized cache with no TGT in it, and would fail the
> gss_init_sec_context() call.

Re-reading the code and your analysis, I agree that it won't work w/o 
external synchronization.

> This ticket describes that problem and
> some possible solutions:
> 
>    https://krbdev.mit.edu/rt/Ticket/Display.html?id=7707
> 
> Heimdal has implemented option 5.  I'm not wild about it and it won't
> work with other ccache types, but it's a working stopgap and it can
> always be backed out in favor of a different solution later.

While I don't understand all of them, option 2 seems to be the most 
obvious (idiotproof) solution for the FILE cache, isn't it? I can't tell 
for the ccache formats.

So for now, the only workaorounds I see are:
1. Initiate the cache in the main thread and then spawn worker threads. 
For long running apps (10 h+) refresh cache although there is no 'kinit 
-R' in GSS-API.
2. Use a per-thread cache to avoid race conditions:
>     spnego = gssapi.OID.from_int_seq("1.3.6.1.5.5.2")
>     if keytab_location:
>         store = {}
>         store[b"client_keytab"] = keytab_location.encode(sys.getdefaultencoding())
>         store[b"ccache"] = ("/tmp/krb5cc_%d_%s" % (os.getpid(), threading.get_ident())).encode(sys.getdefaultencoding())
>         creds = (gssapi.raw.acquire_cred_from(store=store, mechs=[spnego], usage="initiate")).creds

It'd be nice if this limitation would be documented here: 
https://web.mit.edu/kerberos/krb5-1.19/doc/basic/ccache_def.html

Could have spared me quite some time.

Regards,

Michael


More information about the Kerberos mailing list