Erratic behavior of full resync process
Greg Hudson
ghudson at mit.edu
Wed May 13 13:56:32 EDT 2015
On 05/12/2015 04:44 PM, Leonard J. Peirce wrote:
> Authentication attempt failed: 172.30.110.46, GSS-API error strings are:
> Unspecified GSS failure. Minor code may provide more information
> Clock skew too great
I don't know of a reason why this would happen with synchronized clocks.
You could try instrumenting the code in
lib/krb5/krb/rd_req_dec.c which calls krb5_check_clockskew() to find out
what the authenticator timestamp and server local time are; I don't know
of an easier way to investigate.
> On the slave I see syslog entries showing repeated problems with kpropd
> connecting to the master:
>
> /usr/sbin/kpropd: GSS-API (or Kerberos) error while initializing /usr/sbin/kpropd interface, retrying
I assume these correspond to the failed authentication attempts logged
by kadmind.
> I start kpropd with -d -S and use strace on it and I see that repeatedly
> opens /dev/urandom and reads from it just before I see the above error.
That doesn't seem unusual.
> /usr/sbin/kpropd: Connection reset by peer while reading database block starting at offset 92340224
> Full resync was unsuccessful
> Unfortunately, the resync was not successful. Often (but not always), when
> kprop -f starts on the master, the slave_datatrans file will *partially*
> copy to the slave, often 60-90% of the data, before the connection hangs
> and then times out. I have run strace on both the kprop and kpropd processes
> while they are connected. The kprop on the master hangs during a write()
> for several minutes and then eventually times out:
> Process 3183 attached - interrupt to quit
> writev(4, [{"\240\37\26+[\16\247\tC\21\6/\243\217\340\0231f\362\245\3\214$\246\227\231N\265\351\366\1\233"..., 22106}], 1) = -1
> ETIMEDOUT (Connection timed out)
You don't say what's happening on the slave at this point. Is it also
hanging in a read() at the same time? Can you correlate these events
with packet captures on both ends to see if a network element
interjected an RST?
> In my debugging attempts, I tried starting kpropd with
>
> kpropd -S -d -P NNN
>
> and then attempt to run
>
> kprop -f slave_datatrans -P NNN r.test.admin.private
>
> on the master but kpropd on the slave doesn't appear to be listening
> on port NNN. Am I misunderstanding something?
In 1.10, with incremental propagation configured, krpopd doesn't listen
for kprop connections except when it has just requested a full dump from
kadmind. In 1.13 it should always be listening.
More information about the Kerberos
mailing list