One cause of krb5 test suite slowness on some systems

ghudson@MIT.EDU ghudson at MIT.EDU
Fri Jan 30 21:22:27 EST 2009

Since I've started working on Kerberos I've been burdened with the
test suite taking around an hour to run, with the bulk of that time
being in the dejagnu tests.  I'm led to believe some others are also
suffering under these conditions.  Today I sat down and figured out a
major source of slowness.

Looking at tests/dejagnu/dbg.log, I found a lot of:

  write() failed to write anything - will sleep(1) and retry...

It turns out this is due to a bug in expect and/or tcl.  When tcl
flushes a channel, it can (and I guess usually does) generate a
zero-length write to the channel.  When expect's channel code gets
such a request, it dutifully tries to write zero bytes to the channel,
sees that it didn't write any bytes, and freaks out, causing a
one-second delay.  This happens quite a few times during the course of
a test suite run.

The bug has been fixed in expect's sourceforge project since 2005, but
not in any formal release.  The tcl side of the bug (if you consider
generating zero-length channel writes a bug) may also have been fixed
for all I know.  I can't find any evidence of a fix in the current
sources, but tcl's channel code is pretty complicated.

With that locally patched in my system's expect command, the
tests/dejagnu part of the test suite dropped from 27 minutes to 12
minutes, and the overall test suite runs in about 20 minutes.  That
still probably reflects a lot of unnecessary delays (the "user" and
"sys" times total to about 20 seconds) but it's progress.

More information about the krbdev mailing list