Erratic behavior of full resync process

Greg Hudson ghudson at mit.edu
Wed Jun 10 15:09:22 EDT 2015


On 06/10/2015 02:11 PM, Leonard J. Peirce wrote:
> This has been resolved.  The problem was a lack of entropy that caused
> kadmind to block while reading /dev/random and of course refuse connections
> from kpropd.  I installed/started haveged and kadmind now starts up fine.

Thanks for reporting back on this.  I have seen problems from
/dev/random starvation when running the test suite, but I don't think
I've heard of it cropping up in production.  We only read from
/dev/random at kadmind startup and when creating a KDB, I believe; the
rest of the time we use /dev/urandom or our internal PRNG generator.

/dev/random starvation explains the clock skew errors (kadmind isn't
processing the kpropd authentication attempts until much later than they
were sent) but doesn't really explain to me why your full dump
connections are sometimes timing out.  kdb5_util load does not read from
/dev/random as far as I can tell, and neither do the mk_priv/rd_priv
calls used to protect the dump data in transport.

kadmind and kdb5_util do have -W options which force them not to use
/dev/random, which is a less invasive solution than haveged.  But either
should work.

I have a long-standing disagreement with the Linux kernel's contract for
/dev/random.  I believe that applications want to consume lots of
entropy with high convenience and performance, but need a safety
mechanism to address the narrow problem of insufficient entropy after
the first cold boot.  The Linux kernel's assumption is that applications
should consume entropy sparingly (even from /dev/urandom) and care about
the possibility of the kernel RNG's internal generator algorithm being
broken.  Under my assumptions, the kernel should estimate the
accumulated hardware entropy from the first cold boot and block
/dev/random until it has enough, but should never block afterwards.
Under the kernel's assumptions, /dev/random blocks if its hardware
entropy pool has been "depleted" from requests from either /dev/random
or /dev/urandom.  (A PRNG never really "depletes" if its security
properties hold.)

I've considered making -W the default because of this impedance
mismatch, but haven't pulled the trigger on it.


More information about the Kerberos mailing list