The mysterious death of kprop when running incremental propagtion

William Clark majorgearhead at gmail.com
Thu Apr 3 06:43:47 EDT 2014


Very good info.  I am testing a few things. My issue is there is no way for me to simulate the load against the kerb systems in test so I can see if the changes in 1.12 fix my issue so I am kind of stuck doing a replace in prod rollout one host at a time. These are audited systems at the company I work for and the auditors like the fact that updates are provided by simple rpm from upstream. Going to the compile method would change that. Additionally it becomes kind if a pain due to CentOS/RHEL putting things in different locations than the core MIT distro. This is why I am trying to find alternatives. Some things I have thought about are reducing the number of KDC's. Currently I have them in VIPs based on geo location and I have a GSLB above them to Cary out the ego location scheme. The VIP's are doing L3DSR. I could probably take 3 KDC's out and would definitely still be fine as they are all taking load in a very even pattern. I have also toyed with setting the kiprop interval higher and increasing the size of the ulog. Also adding to this, pushing the every ten min dump to 30 mins. Not sure if any of these things will help.  Since my choices are limited and incremental propagation is desired. 

Again thank you both for taking the time to explore this with me!

William Clark

> On Apr 3, 2014, at 12:54 AM, Jeremy Hunt <jeremyh at optimation.com.au> wrote:
> 
> Hi William,
> 
> Of course for option 2, the reads of the dumped database probably need no locks so you could probably do all of propagations to slaves in parallel, though you might want to do blocks of them (two or three at a time) rather than all at once, depending on the propagation time.
> 
> Jeremy
>  
>> --- Original message --- 
>> Subject: Re: The mysterious death of kprop when running incremental propagtion 
>> From: Jeremy Hunt <jeremyh at optimation.com.au> 
>> To: William Clark <majorgearhead at gmail.com> 
>> Cc: <kerberos at mit.edu> 
>> Date: Thursday, 03/04/2014 1:43 PM
>> 
>> 
>> 
>> Hi WIlliam,
>> 
>> Apologies for not responding sooner.
>> 
>> You have 9 kprop's pinging your server. They chat to the kadmind 
>> service which will propagate individual entries back to them and 
>> occasionally do a full propagation. During the full propagation it 
>> will do a dump with kdb5_util itself. I am uncertain how it handles 9 
>> requests for a full propagation. On top of this you say you do a 
>> kdb5_util dump every 10 minutes.
>> 
>> This probably does strain the locking mechanism, especially as it is 
>> all asynchronous, you have no control over when these run. If it is a 
>> locking problem then it might be as much due to the kprop processes 
>> tripping over each other as the kdb5_util dump process. It might even 
>> be more likely to be the cause.
>> 
>> Greg is correct that the propagation code has improved considerably 
>> with the later versions.
>> 
>> I have two suggestions.
>> 
>> 1. You do not have to use the Centos kerberos package. You can 
>> download and build the latest MIT kerberos one and use that, just turn 
>> off or deinstall the Centos packages. You do have to watch for 
>> security alerts yourself and patch and rebuild the code yourself. This 
>> is not too bad as kerberos is pretty robust, and there are not a lot 
>> of these. You would also need to test that this solved your problem, 
>> do you have a test facility with 9 kprop processes from different 
>> machines?
>> 
>> 2. Go back to the old cron doing full propagation in a controlled 
>> manner, so they don't trip over each other. If it takes 20 seconds to 
>> dump the database, it probably doesn't take too much longer to 
>> propagate the full database, time this propagation time and code 
>> things appropriately. So dump, do 9 propagations, and every 10 minutes 
>> save your dump. Be careful though, for instance if it takes about 20 
>> seconds to propagate the database each time, then for 9 propagations 
>> your updates fall back to 3 minute turnarounds.
>> 
>> Good Luck,
>> 
>> Jeremy
>> 
>> 
>>> 
>>> --- Original message ---
>>> Subject: Re: The mysterious death of kprop when running incremental 
>>> propagtion
>>> From: William Clark <majorgearhead at gmail.com>
>>> To: Greg Hudson <ghudson at mit.edu>
>>> Cc: <kerberos at mit.edu>
>>> Date: Thursday, 03/04/2014 9:07 AM
>>> 
>>> I am in a rock and a hard place. I must use CentOS upstream packages, 
>>> however their upstream latest is 10.10.3. I see one of the bugs fixed 
>>> was an issue where a full propagation doesn’t complete all the way 
>>> but kprop thinks its fine. I think this may be what I am hitting. 
>>> Wondering if there is any tuning I could do to mitigate this while I 
>>> wait for later packages. My only other option is to go back to 
>>> traditional propagation.
>>> 
>>> Right now my slaves have this config:
>>> iprop_master_ulogsize = 1000
>>> iprop_slave_poll = 2m
>>> 
>>> Additionally like I shared before, I am running the following every 10 
>>> mins '/usr/sbin/kdb5_util dump'
>>> 
>>> I wonder if upping the ulog size would allow more time before a full 
>>> prop is called for those times my server is ultra busy. My thinking 
>>> is this may be happening during full prop which happens because the 
>>> server was busy for a period of time.
>>> 
>>> Any thoughts would be helpful.
>>> 
>>> 
>>> William Clark
>>> 
>>> 
>>> 
>>>> On Mar 31, 2014, at 8:34 PM, Greg Hudson <ghudson at MIT.EDU> wrote:
>>>> 
>>>> 
>>>>> On 03/31/2014 05:44 PM, William Clark wrote:
>>>>> 
>>>>> Running the following from CentOS upstream:
>>>>> krb5-server-1.10.3-10.el6_4.6.x86_64
>>>>> 
>>>>> I am not adverse to going with the latest stable MIT version if it 
>>>>> will
>>>>> help in this.
>>>> 
>>>> I think testing 1.12.1 would be worthwhile. I don't know of any
>>>> specific bugs in 1.10 which could lead to a SIGABRT, but there are
>>>> numerous iprop and locking improvements which went into 1.11 and 1.12
>>>> but were too invasive to backport to 1.10.
>>> 
>>> ________________________________________________
>>> Kerberos mailing list Kerberos at mit.edu
>>> https://mailman.mit.edu/mailman/listinfo/kerberos
>> 
>> ________________________________________________
>> Kerberos mailing list Kerberos at mit.edu
>> https://mailman.mit.edu/mailman/listinfo/kerberos
> 


More information about the Kerberos mailing list