multithreaded access to the same ccache
Sorin Manolache
sorinm at gmail.com
Tue Jun 16 19:33:33 EDT 2015
Hello,
I'm using GSSAPI + Kerberos in a multithreaded application.
I'm creating a ccache of type memory and pass its name to GSSAPI
(gss_krb5_ccache_name).
I'm seeing some segfaults in my application. After a lot of debugging
I've come to the following hypothesis:
Two threads execute gss_acquire_cred_with_password in parallel.
gss_acquire_cred_with_password will create a krb5_context in each
thread. Both contexts will have the same default ccache name, picked
from the thread-local variable set by gss_krb5_ccache_name.
As the ccache is empty, both threads will make a KDC request in order to
fetch a credential.
Let us consider that the first thread finishes first and proceeds with a
call to gss_export_cred. While the first thread executes in
gss_export_cred, the second thread got its cred from the KDC and wants
to insert it in the shared ccache.
gss_export_cred will run a loop on the ccache (json_ccache_contents in
lib/gssapi/krb5/export_cred.c). The loop is
krb5_cc_start_seq_get(context, ccache, &cursor);
while ((ret = krb5_cc_next_cred(...)) == 0) {
...
}
krb5_cc_end_seq_get(context, ccache, &cursor);
At the same time, the second thread executes init_creds_step_reply
(lib/krb5/krb/get_in_tkt.c) which in turn executes
krb5_cc_initialize(context, out_ccache, ...). The latter destroys the
contents of the ccache. This may cause a segfault in the first thread,
as there's no mutual exclusion between the cache contents destruction
and the cache iteration.
What I'm observing in core-files is that 'cursor' in the first thread
has a value that is different from any ccache->data->link,
ccache->data->link->next, ccache->data->link->next->next, etc. My only
explanation is that the cache contents was destroyed between the
krb5_cc_start_seq_get and the krb5_cc_next_cred calls.
I've changed the code of kerberos and I've placed krb5_cc_lock and
krb5_cc_unlock before and after the loop in json_ccache_contents in
export_cred.c. I have not tested sufficiently but it seems that the
segfault does not occur anymore in json_ccache_contents. However I get
another segfault in another unlocked ccache traversal, namely in
acquire_init_cred->scan_ccache.
Is this hypothesis correct? Is there something I could do at
application-level in order to avoid this situation? I'm thinking of
either protecting all calls to GSSAPI/Kerberos by mutexes or using
thread-level ccaches, though both approaches would decrease the performance.
Regards,
Sorin
More information about the krbdev
mailing list