The mysterious death of kprop when running incremental propagtion

Wed Apr 2 19:12:32 EDT 2014

 Hi WIlliam,

Apologies for not responding sooner.

You have 9 kprop's pinging your server. They chat to the kadmind 
service which will propagate individual entries back to them and 
occasionally do a full propagation. During the full propagation it 
will do a dump with kdb5_util itself. I am uncertain how it handles 9 
requests for a full propagation. On top of this you say you do a 
kdb5_util dump every 10 minutes.

This probably does strain the locking mechanism, especially as it is 
all asynchronous, you have no control over when these run. If it is a 
locking problem then it might be as much due to the kprop processes 
tripping over each other as the kdb5_util dump process. It might even 
be more likely to be the cause.

Greg is correct that the propagation code has improved considerably 
with the later versions.

I have two suggestions.

1. You do not have to use the Centos kerberos package. You can 
download and build the latest MIT kerberos one and use that, just turn 
off or deinstall the Centos packages. You do have to watch for 
security alerts yourself and patch and rebuild the code yourself. This 
is not too bad as kerberos is pretty robust, and there are not a lot 
of these. You would also need to test that this solved your problem, 
do you have a test facility with 9 kprop processes from different 
machines?

2. Go back to the old cron doing full propagation in a controlled 
manner, so they don't trip over each other. If it takes 20 seconds to 
dump the database, it probably doesn't take too much longer to 
propagate the full database, time this propagation time and code 
things appropriately. So dump, do 9 propagations, and every 10 minutes 
save your dump. Be careful though, for instance if it takes about 20 
seconds to propagate the database each time, then for 9 propagations 
your updates fall back to 3 minute turnarounds.

Good Luck,

Jeremy

>
> --- Original message ---
> Subject: Re: The mysterious death of kprop when running incremental 
> propagtion
> From: William Clark <majorgearhead at gmail.com>
> To: Greg Hudson <ghudson at mit.edu>
> Cc: <kerberos at mit.edu>
> Date: Thursday, 03/04/2014  9:07 AM
>
> I am in a rock and a hard place.  I must use CentOS upstream packages, 
> however their upstream latest is 10.10.3.  I see one of the bugs fixed 
> was an issue where a full propagation doesn’t complete all the way 
> but kprop thinks its fine.  I think this may be what I am hitting.  
> Wondering if there is any tuning I could do to mitigate this while I 
> wait for later packages.  My only other option is to go back to 
> traditional propagation.
>
> Right now my slaves have this config:
> iprop_master_ulogsize = 1000
> iprop_slave_poll = 2m
>
> Additionally like I shared before, I am running the following every 10 
> mins '/usr/sbin/kdb5_util dump'
>
> I wonder if upping the ulog size would allow more time before a full 
> prop is called for those times my server is ultra busy.  My thinking 
> is this may be happening during full prop which happens because the 
> server was busy for a period of time.
>
> Any thoughts would be helpful.
>
>
> William Clark
>
>
>
> On Mar 31, 2014, at 8:34 PM, Greg Hudson <ghudson at MIT.EDU> wrote:
>
>>
>> On 03/31/2014 05:44 PM, William Clark wrote:
>>>
>>> Running the following from CentOS upstream:
>>> krb5-server-1.10.3-10.el6_4.6.x86_64
>>>
>>> I am not adverse to going with the latest stable MIT version if it 
>>> will
>>> help in this.
>>
>> I think testing 1.12.1 would be worthwhile.  I don't know of any
>> specific bugs in 1.10 which could lead to a SIGABRT, but there are
>> numerous iprop and locking improvements which went into 1.11 and 1.12
>> but were too invasive to backport to 1.10.
>
> ________________________________________________
> Kerberos mailing list           Kerberos at mit.edu
> https://mailman.mit.edu/mailman/listinfo/kerberos