long running kadm5 program running into errors

Chris Hecker checker at d6.com
Thu Aug 16 20:47:00 EDT 2018


I have a long-running daemon that reads a kadm5 admin key from a file
keytab into a memory keytab before dropping privs, and then when the
kadm5 connection drops (using checks for KADM5_RPC_ERROR), it loops and
tries to reconnect with kadm5_init_with_skey.  This works fine, with 1.9
it would stay up for months like this, and with 1.15.1 it was up for
weeks.  Or, at least, it worked fine until an hour ago...

This time, when it got the KADM5_RPC_ERROR, it reconnected and got an
EBUSY.  If I look at kadmind.log, I see this:

Aug 16 16:17:12 domain.com kadmind[31118](Error): check_rpcsec_auth:
failed inquire_context, stat=786432
Aug 16 16:17:12 domain.com kadmind[31118](Notice): Authentication
attempt failed: x.x.x.x, GSS-API error strings are:
Aug 16 16:17:12 domain.com kadmind[31118](Notice):     The referenced
context has expired
Aug 16 16:17:12 domain.com kadmind[31118](Notice):     Success
Aug 16 16:17:12 domain.com kadmind[31118](Notice):    GSS-API error
strings complete.
Aug 16 16:17:12 domain.com kadmind[31118](Error): Authentication attempt
failed: x.x.x.x, RPC authentication flavor 6
Aug 16 16:17:12 domain.com kadmind[31118](Error): check_rpcsec_auth:
failed inquire_context, stat=786432
Aug 16 16:17:12 domain.com kadmind[31118](Notice): Authentication
attempt failed: x.x.x.x, GSS-API error strings are:
Aug 16 16:17:12 domain.com kadmind[31118](Notice):     The referenced
context has expired
Aug 16 16:17:12 domain.com kadmind[31118](Notice):     Success
Aug 16 16:17:12 domain.com kadmind[31118](Notice):    GSS-API error
strings complete.
Aug 16 16:17:12 domain.com kadmind[31118](Error): Authentication attempt
failed: x.x.x.x, RPC authentication flavor 6
Aug 16 16:17:12 domain.com kadmind[31118](Error): check_rpcsec_auth:
failed inquire_context, stat=786432
Aug 16 16:17:12 domain.com kadmind[31118](Notice): Authentication
attempt failed: x.x.x.x, GSS-API error strings are:
Aug 16 16:17:12 domain.com kadmind[31118](Notice):     The referenced
context has expired
Aug 16 16:17:12 domain.com kadmind[31118](Notice):     Success
Aug 16 16:17:12 domain.com kadmind[31118](Notice):    GSS-API error
strings complete.
Aug 16 16:17:12 domain.com kadmind[31118](Error): Authentication attempt
failed: x.x.x.x, RPC authentication flavor 6
Aug 16 16:17:12 domain.com kadmind[31118](info): closing down fd 22

I didn't retry in this case, so I assume these retries were inside libkadm5?

The krb5kdc.log was find this whole time, and my uptime probes that got
tickets were fine.  The kdc and this libkadm5 app are on the same machine.

Is this just computers being weird and I should handle this case and
loop and keep trying?  There's no new timeout or anything added between
1.9 and 1.15 I should know about?

Thanks,
Chris




More information about the Kerberos mailing list