krb524 stops working (MIT 1.5.4 + 2007-005/6 patches + fakeka patch)

John Tang Boyland boyland at cs.uwm.edu
Fri Nov 9 17:30:57 EST 2007


] On Nov 9, 2007, at 11:14, John Tang Boyland wrote:
] > However, every once in a while, the krb524d stops responding to
] > requests.  "ps augx" says:
] >
] > USER       PID %CPU %MEM   SZ  RSS TT       S    START  TIME COMMAND
] > root     12025  1.2 63.5184632156496 ?        S   Oct 15 382:56 / 
] > opt/local/sbin/krb524d -m
] >
] > Last time this happened (apparently October 15th), I killed it and
] > started it up again, and it worked just fine.  But now it stopped
] > working again.
] 
] That's not something we've seen before...  when it's in this state,  
] could you try running strace on the process (for Linux, truss for  
] Solaris, etc) while you're sending requests it's not responding to? 

It's Solaris.  I tried "truss -p 12025" and nothing happens when I run
the client (an old version of aklog) that tries to contact it.
aklog times out, with error:
...
Kerberos error code returned by get_cred: -1765328228
aklog: Couldn't get cs.uwm.edu AFS tickets:
aklog: Cannot contact any KDC for requested realm while getting AFS tickets

On the other hand, if I "truss -p" the fakeka process, I gets lots of
appropriate debugging information.  (Here I tested using "klog".)

] If you've built it with debug info, attaching the process under gdb  
] to get a stack trace may help too, if strace doesn't show any
] activity.

It wasn't built with debugging, but did at least show:

...
Symbols already loaded for /usr/lib/libthread.so.1
Symbols already loaded for /usr/lib/nss_files.so.1
Symbols already loaded for /opt/local/lib/krb5/plugins/kdb/db2.so
0xfef9b75c in __lwp_sema_wait ()
(gdb) where
#0  0xfef9b75c in __lwp_sema_wait ()
#1  0xfee89830 in _park ()
#2  0xfee89508 in _swtch ()
#3  0xfee8d960 in _reap_wait ()
#4  0xfee8d6b8 in _reaper ()

This is an ancient version of gdb, I'm afraid.

] Monitoring traffic to krb524d with tcpdump (ideally, recording the  
] full packets) might be handy if you catch it at some point while it's  
] growing massively like this.  (Though on the off chance that someone  
] has come up with a krb524d deathgram packet, please report the info  
] to krbcore-security at mit so we could try to get a fix out before the  
] means of killing it becomes too widespread.)

I'll kill the process now and restart it.
I'll see if I can catch krb524d when it's on a growth spurt.
(and perhaps it is gradual.)
...
OK, it's sane again now that I restarted it.

USER       PID %CPU %MEM   SZ  RSS TT       S    START  TIME COMMAND
root     20757  0.0  0.8 3704 1792 ?        S 16:25:07  0:00 /opt/local/sbin/krb524d -m

More in a week?

Thanks,
John



More information about the Kerberos mailing list