The mysterious death of kprop when running incremental propagtion
Jeremy Hunt
jeremyh at optimation.com.au
Wed Apr 2 19:12:32 EDT 2014
Hi WIlliam,
Apologies for not responding sooner.
You have 9 kprop's pinging your server. They chat to the kadmind
service which will propagate individual entries back to them and
occasionally do a full propagation. During the full propagation it
will do a dump with kdb5_util itself. I am uncertain how it handles 9
requests for a full propagation. On top of this you say you do a
kdb5_util dump every 10 minutes.
This probably does strain the locking mechanism, especially as it is
all asynchronous, you have no control over when these run. If it is a
locking problem then it might be as much due to the kprop processes
tripping over each other as the kdb5_util dump process. It might even
be more likely to be the cause.
Greg is correct that the propagation code has improved considerably
with the later versions.
I have two suggestions.
1. You do not have to use the Centos kerberos package. You can
download and build the latest MIT kerberos one and use that, just turn
off or deinstall the Centos packages. You do have to watch for
security alerts yourself and patch and rebuild the code yourself. This
is not too bad as kerberos is pretty robust, and there are not a lot
of these. You would also need to test that this solved your problem,
do you have a test facility with 9 kprop processes from different
machines?
2. Go back to the old cron doing full propagation in a controlled
manner, so they don't trip over each other. If it takes 20 seconds to
dump the database, it probably doesn't take too much longer to
propagate the full database, time this propagation time and code
things appropriately. So dump, do 9 propagations, and every 10 minutes
save your dump. Be careful though, for instance if it takes about 20
seconds to propagate the database each time, then for 9 propagations
your updates fall back to 3 minute turnarounds.
Good Luck,
Jeremy
>
> --- Original message ---
> Subject: Re: The mysterious death of kprop when running incremental
> propagtion
> From: William Clark <majorgearhead at gmail.com>
> To: Greg Hudson <ghudson at mit.edu>
> Cc: <kerberos at mit.edu>
> Date: Thursday, 03/04/2014 9:07 AM
>
> I am in a rock and a hard place. I must use CentOS upstream packages,
> however their upstream latest is 10.10.3. I see one of the bugs fixed
> was an issue where a full propagation doesn’t complete all the way
> but kprop thinks its fine. I think this may be what I am hitting.
> Wondering if there is any tuning I could do to mitigate this while I
> wait for later packages. My only other option is to go back to
> traditional propagation.
>
> Right now my slaves have this config:
> iprop_master_ulogsize = 1000
> iprop_slave_poll = 2m
>
> Additionally like I shared before, I am running the following every 10
> mins '/usr/sbin/kdb5_util dump'
>
> I wonder if upping the ulog size would allow more time before a full
> prop is called for those times my server is ultra busy. My thinking
> is this may be happening during full prop which happens because the
> server was busy for a period of time.
>
> Any thoughts would be helpful.
>
>
> William Clark
>
>
>
> On Mar 31, 2014, at 8:34 PM, Greg Hudson <ghudson at MIT.EDU> wrote:
>
>>
>> On 03/31/2014 05:44 PM, William Clark wrote:
>>>
>>> Running the following from CentOS upstream:
>>> krb5-server-1.10.3-10.el6_4.6.x86_64
>>>
>>> I am not adverse to going with the latest stable MIT version if it
>>> will
>>> help in this.
>>
>> I think testing 1.12.1 would be worthwhile. I don't know of any
>> specific bugs in 1.10 which could lead to a SIGABRT, but there are
>> numerous iprop and locking improvements which went into 1.11 and 1.12
>> but were too invasive to backport to 1.10.
>
> ________________________________________________
> Kerberos mailing list Kerberos at mit.edu
> https://mailman.mit.edu/mailman/listinfo/kerberos
More information about the Kerberos
mailing list