Erratic behavior of full resync process

Greg Hudson ghudson at mit.edu
Wed May 13 13:56:32 EDT 2015


On 05/12/2015 04:44 PM, Leonard J. Peirce wrote:
>     Authentication attempt failed: 172.30.110.46, GSS-API error strings are:
>         Unspecified GSS failure.  Minor code may provide more information
>         Clock skew too great

I don't know of a reason why this would happen with synchronized clocks.
 You could try instrumenting the code in
lib/krb5/krb/rd_req_dec.c which calls krb5_check_clockskew() to find out
what the authenticator timestamp and server local time are; I don't know
of an easier way to investigate.

> On the slave I see syslog entries showing repeated problems with kpropd
> connecting to the master:
> 
>     /usr/sbin/kpropd: GSS-API (or Kerberos) error while initializing /usr/sbin/kpropd interface, retrying

I assume these correspond to the failed authentication attempts logged
by kadmind.

> I start kpropd with -d -S and use strace on it and I see that repeatedly
> opens /dev/urandom and reads from it just before I see the above error.

That doesn't seem unusual.

>     /usr/sbin/kpropd: Connection reset by peer while reading database block starting at offset 92340224
>     Full resync was unsuccessful

> Unfortunately, the resync was not successful.  Often (but not always), when
> kprop -f starts on the master, the slave_datatrans file will *partially*
> copy to the slave, often 60-90% of the data, before the connection hangs
> and then times out.  I have run strace on both the kprop and kpropd processes
> while they are connected.  The kprop on the master hangs during a write()
> for several minutes and then eventually times out:

>     Process 3183 attached - interrupt to quit
>     writev(4, [{"\240\37\26+[\16\247\tC\21\6/\243\217\340\0231f\362\245\3\214$\246\227\231N\265\351\366\1\233"..., 22106}], 1) = -1 
> ETIMEDOUT (Connection timed out)

You don't say what's happening on the slave at this point.  Is it also
hanging in a read() at the same time?  Can you correlate these events
with packet captures on both ends to see if a network element
interjected an RST?

> In my debugging attempts, I tried starting kpropd with
> 
>     kpropd -S -d -P NNN
> 
> and then attempt to run
> 
>     kprop -f slave_datatrans -P NNN r.test.admin.private
> 
> on the master but kpropd on the slave doesn't appear to be listening
> on port NNN.  Am I misunderstanding something?

In 1.10, with incremental propagation configured, krpopd doesn't listen
for kprop connections except when it has just requested a full dump from
kadmind.  In 1.13 it should always be listening.


More information about the Kerberos mailing list