The mysterious death of kprop when running incremental propagtion

Jeremy Hunt jeremyh at optimation.com.au
Thu Apr 3 00:54:57 EDT 2014


Hi William,

Of course for option 2, the reads of the dumped database probably need 
no locks so you could probably do all of propagations to slaves in 
parallel, though you might want to do blocks of them (two or three at 
a time) rather than all at once, depending on the propagation time.


Jeremy

>
> --- Original message ---
> Subject: Re: The mysterious death of kprop when running incremental 
> propagtion
> From: Jeremy Hunt <jeremyh at optimation.com.au>
> To: William Clark <majorgearhead at gmail.com>
> Cc: <kerberos at mit.edu>
> Date: Thursday, 03/04/2014  1:43 PM
>
>
>
> Hi WIlliam,
>
> Apologies for not responding sooner.
>
> You have 9 kprop's pinging your server. They chat to the kadmind
> service which will propagate individual entries back to them and
> occasionally do a full propagation. During the full propagation it
> will do a dump with kdb5_util itself. I am uncertain how it handles 9
> requests for a full propagation. On top of this you say you do a
> kdb5_util dump every 10 minutes.
>
> This probably does strain the locking mechanism, especially as it is
> all asynchronous, you have no control over when these run. If it is a
> locking problem then it might be as much due to the kprop processes
> tripping over each other as the kdb5_util dump process. It might even
> be more likely to be the cause.
>
> Greg is correct that the propagation code has improved considerably
> with the later versions.
>
> I have two suggestions.
>
> 1. You do not have to use the Centos kerberos package. You can
> download and build the latest MIT kerberos one and use that, just turn
> off or deinstall the Centos packages. You do have to watch for
> security alerts yourself and patch and rebuild the code yourself. This
> is not too bad as kerberos is pretty robust, and there are not a lot
> of these. You would also need to test that this solved your problem,
> do you have a test facility with 9 kprop processes from different
> machines?
>
> 2. Go back to the old cron doing full propagation in a controlled
> manner, so they don't trip over each other. If it takes 20 seconds to
> dump the database, it probably doesn't take too much longer to
> propagate the full database, time this propagation time and code
> things appropriately. So dump, do 9 propagations, and every 10 minutes
> save your dump. Be careful though, for instance if it takes about 20
> seconds to propagate the database each time, then for 9 propagations
> your updates fall back to 3 minute turnarounds.
>
> Good Luck,
>
> Jeremy
>
>
>>
>>
>> --- Original message ---
>> Subject: Re: The mysterious death of kprop when running incremental
>> propagtion
>> From: William Clark <majorgearhead at gmail.com>
>> To: Greg Hudson <ghudson at mit.edu>
>> Cc: <kerberos at mit.edu>
>> Date: Thursday, 03/04/2014  9:07 AM
>>
>> I am in a rock and a hard place.  I must use CentOS upstream packages,
>> however their upstream latest is 10.10.3.  I see one of the bugs fixed
>> was an issue where a full propagation doesn’t complete all the way
>> but kprop thinks its fine.  I think this may be what I am hitting.
>> Wondering if there is any tuning I could do to mitigate this while I
>> wait for later packages.  My only other option is to go back to
>> traditional propagation.
>>
>> Right now my slaves have this config:
>> iprop_master_ulogsize = 1000
>> iprop_slave_poll = 2m
>>
>> Additionally like I shared before, I am running the following every 10
>> mins '/usr/sbin/kdb5_util dump'
>>
>> I wonder if upping the ulog size would allow more time before a full
>> prop is called for those times my server is ultra busy.  My thinking
>> is this may be happening during full prop which happens because the
>> server was busy for a period of time.
>>
>> Any thoughts would be helpful.
>>
>>
>> William Clark
>>
>>
>>
>> On Mar 31, 2014, at 8:34 PM, Greg Hudson <ghudson at MIT.EDU> wrote:
>>
>>>
>>>
>>> On 03/31/2014 05:44 PM, William Clark wrote:
>>>>
>>>>
>>>> Running the following from CentOS upstream:
>>>> krb5-server-1.10.3-10.el6_4.6.x86_64
>>>>
>>>> I am not adverse to going with the latest stable MIT version if it
>>>> will
>>>> help in this.
>>>
>>> I think testing 1.12.1 would be worthwhile.  I don't know of any
>>> specific bugs in 1.10 which could lead to a SIGABRT, but there are
>>> numerous iprop and locking improvements which went into 1.11 and 1.12
>>> but were too invasive to backport to 1.10.
>>
>> ________________________________________________
>> Kerberos mailing list           Kerberos at mit.edu
>> https://mailman.mit.edu/mailman/listinfo/kerberos
>
> ________________________________________________
> Kerberos mailing list           Kerberos at mit.edu
> https://mailman.mit.edu/mailman/listinfo/kerberos



More information about the Kerberos mailing list