[kerberos-discuss] LDAP & Kerberos interaction & SIGPIPE

Fri Sep 25 13:13:17 EDT 2009

On 09/25/09 12:50, Tom Yu wrote:
> Peter Shoults <Peter.Shoults at Sun.COM> writes:
>
>   
>> I am sending this out again as the customer has tested a fix I provided
>> and determined that it does resolve their issue.  I would like to move
>> forward with a fix, and the one I am using now is the one mentioned
>> below with the addition of signal().  If I can get some comments, I
>> would appreciate it.  Otherwise - I guess I will just proceed to put
>> this into Solaris code.
>>
>> Pete
>>     
>
> Is there a reason to not use sigaction() here if possible?  That
> should make things more portable than calling signal().
>
>   
I used signal() because that is what is being used in this file as it
exists now, so I did it to maintain a standard look/feel.
>> On 09/16/09 10:48, Peter Shoults wrote:
>>     
>>> Hi,
>>>
>>> Customer has brought forward an issue they were having with Kerberos and
>>> LDAP, where LDAP is being used to store the database information for
>>> Kerberos.  The issue is that if the LDAP server is restarted for any
>>> reason, then Kerberos does not automatically resync back with the LDAP
>>> server when the LDAP server is back up and running.  Specifically, one
>>> can run and login into kadmin, but any commands that are run will fail
>>> with the error:
>>>
>>> "Communication failure with server while retrieving list."
>>>
>>> It turns out if the user exits from kadmin and logs back in a second
>>> time, then the command do work fine.
>>>
>>> I have determined that the cause of this problem is that when the LDAP
>>> server is restarted, all the connections we have on port 636 to the LDAP
>>> server go into a CLOSE_WAIT/FIN_WAIT_2 state.  When we log into kadmin,
>>> we attempt to contact the LDAP server on these connections, and we
>>> received SIGPIPE in response to our writes.  Here is a snippet from truss:
>>>
>>> 3200/1:         57.2401 write(14, 0x0010B810, 23)                      
>>> Err#32 EPIPE
>>> 3200/1:                             150301\012941A 60F Y P87A7BE9318B6
>>> c8C |0F   v
>>> 3200/1:         57.2404     Received signal #13, SIGPIPE [caught]
>>>
>>> This is fine - the sig_pipe handler is invoked and we do print out the
>>> syslog message.  However, we never reset the signal disposition for
>>> SIGPIPE.    kadmind process immediately proceeds to try the next
>>> connection to the LDAP server, and again gets SIGPIPE.  This time
>>> though, the default handler is invoked, which terminates kadmind.  At
>>> this point, SMF realizes kadmind has died and restarts it, which
>>> re-establishes all our connections to the LDAP server and that explains
>>> why a subsequent login to kadmin will work.
>>>
>>> I have two questions about this.  The first why do we have a handler for
>>> SIGPIPE in the kadmin code, unlike the krb5kdc code, which sets SIGPIPE
>>> disposition to SIG_IGNORE.  This handler in the kadmin code has not
>>> changed in a long long time.  I tested setting SIGPIPE to SIG_IGN and
>>> this does allow a user to enter commands into kadmin after LDAP server
>>> restarts and run commands without issue.
>>>
>>> Assuming we have the SIGPIPE handler specifically to output the syslog
>>> message, then I propose that we have in the handler a resetting of the
>>> signal disposition to sig_pipe.  I have also tested this fix and
>>> verified that this also resolves the problem and allows the user to
>>> enter kadmin commands after LDAP server restarts.  Here is my change:
>>>
>>> file modified is ovsec_kadmd.c
>>>
>>> void
>>> sig_pipe(int unused)
>>> {
>>> +        signal(SIGPIPE, sig_pipe);
>>>         krb5_klog_syslog(LOG_NOTICE, gettext("Warning: Received a SIGPIPE; "
>>>                 "probably a client aborted.  Continuing."));
>>> }
>>>
>>>
>>> Pete
>>>
>>>   
>>>       
>> _______________________________________________
>> kerberos-discuss mailing list
>> kerberos-discuss at opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/kerberos-discuss
>>