[Fwd: LDAP & Kerberos interaction & SIGPIPE]

Peter Shoults Peter.Shoults at Sun.COM
Wed Sep 16 14:21:27 EDT 2009


Was told to forward this along to this alias as well....


-------- Original Message --------
LDAP & Kerberos interaction & SIGPIPE
Date: 	Wed, 16 Sep 2009 10:48:06 -0400
From: 	Peter Shoults <peter.shoults at sun.com>
To: 	kerberos-discuss <kerberos-discuss at opensolaris.org>


Customer has brought forward an issue they were having with Kerberos and
LDAP, where LDAP is being used to store the database information for
Kerberos.  The issue is that if the LDAP server is restarted for any
reason, then Kerberos does not automatically resync back with the LDAP
server when the LDAP server is back up and running.  Specifically, one
can run and login into kadmin, but any commands that are run will fail
with the error:

"Communication failure with server while retrieving list."

It turns out if the user exits from kadmin and logs back in a second
time, then the command do work fine.

I have determined that the cause of this problem is that when the LDAP
server is restarted, all the connections we have on port 636 to the LDAP
server go into a CLOSE_WAIT/FIN_WAIT_2 state.  When we log into kadmin,
we attempt to contact the LDAP server on these connections, and we
received SIGPIPE in response to our writes.  Here is a snippet from truss:

3200/1:         57.2401 write(14, 0x0010B810, 23)                      
Err#32 EPIPE
3200/1:                             150301\012941A 60F Y P87A7BE9318B6
c8C |0F   v
3200/1:         57.2404     Received signal #13, SIGPIPE [caught]

This is fine - the sig_pipe handler is invoked and we do print out the
syslog message.  However, we never reset the signal disposition for
SIGPIPE.    kadmind process immediately proceeds to try the next
connection to the LDAP server, and again gets SIGPIPE.  This time
though, the default handler is invoked, which terminates kadmind.  At
this point, SMF realizes kadmind has died and restarts it, which
re-establishes all our connections to the LDAP server and that explains
why a subsequent login to kadmin will work.

I have two questions about this.  The first why do we have a handler for
SIGPIPE in the kadmin code, unlike the krb5kdc code, which sets SIGPIPE
disposition to SIG_IGNORE.  This handler in the kadmin code has not
changed in a long long time.  I tested setting SIGPIPE to SIG_IGN and
this does allow a user to enter commands into kadmin after LDAP server
restarts and run commands without issue.

Assuming we have the SIGPIPE handler specifically to output the syslog
message, then I propose that we have in the handler a resetting of the
signal disposition to sig_pipe.  I have also tested this fix and
verified that this also resolves the problem and allows the user to
enter kadmin commands after LDAP server restarts.  Here is my change:

file modified is ovsec_kadmd.c

sig_pipe(int unused)
+        signal(SIGPIPE, sig_pipe);
        krb5_klog_syslog(LOG_NOTICE, gettext("Warning: Received a SIGPIPE; "
                "probably a client aborted.  Continuing."));


