Concurrency issues with FILE ccache
Osipov, Michael (LDA IT PLM)
michael.osipov at siemens.com
Fri Apr 9 11:35:26 EDT 2021
Am 2021-04-06 um 19:28 schrieb Greg Hudson:
> On 4/6/21 11:48 AM, Osipov, Michael (LDA IT PLM) wrote:
>> gssapi.raw.misc.GSSError: Major (851968): Unspecified GSS failure. Minor code may provide more information, Minor (100001): Failed to store credentials: Internal credentials cache error (filename: /tmp/krb5cc_1000)
>
> This is not expected, and bears investigation. It suggests an EINVAL,
> EEXIST, EFAULT, EBADF, or EWOULDBLOCK error from one of the I/O
> operations performed by fcc_store(), none of which are expected. If
> you're building libkrb5, you could try modifying interpret_error() to
> pass those error codes through in order to find out which one is happening.
>
> Getting multiple cache entries for a service is normal when multiple
> threads or processes initiate contexts to the same (new) service within
> a short window.
>
Hi Greg,
so I was able to properly compile and install 1.19.1 in the GitLab
Runner and verified that py-gssapi picks it up from LD_LIBRARY_PATH.
Unfortunately, 1.19.1 still suffers from the same problem as 1.17. I
tried to narrow it down with strace, but that changes the runtime
behavior of the application and the error disappears. I did patch the
fcc_store() funtion:
> $ git diff
> diff --git a/src/lib/krb5/ccache/cc_file.c b/src/lib/krb5/ccache/cc_file.c
> index 9a9b45a6e..7f604c0f4 100644
> --- a/src/lib/krb5/ccache/cc_file.c
> +++ b/src/lib/krb5/ccache/cc_file.c
> @@ -1000,8 +1000,9 @@ fcc_store(krb5_context context, krb5_ccache id, krb5_creds *creds)
> if (ret)
> goto cleanup;
> nwritten = write(fileno(fp), buf.data, buf.len);
> - if (nwritten == -1)
> + if (nwritten == -1) {
> ret = interpret_errno(context, errno);
> + printf("errno: %d, ret: %d\n", errno, ret); }
> if ((size_t)nwritten != buf.len)
> ret = KRB5_CC_IO;
but the output did not appear. Then I patched the interpret_errno()
dirctly for the internal error:
> @@ -1293,6 +1294,7 @@ interpret_errno(krb5_context context, int errnum)
> case EWOULDBLOCK:
> #endif
> ret = KRB5_FCC_INTERNAL;
> + printf("errnum: %d, ret: %d\n", errnum, ret);
> break;
> /*
> * The rest all map to KRB5_CC_IO. These errnos are listed to
I had exactly one faiure in the job and received exactly this:
> errnum: 17, ret: -1765328188
which maps to EEXIST
I am quite sure that this is a race condition where stat() is performed,
file does not exist, open() with write is performed, in parallel it is
already created and the later call returns in EEXIST.
I assumed it to be fcc_initialize() and added a printf():
> fcc_initialize()
> errnum: 17, ret: -1765328188
> fcc_initialize()
> errnum: 17, ret: -1765328188
What now?
Michael
More information about the Kerberos
mailing list