kpropd on replica core dumping at start! (was Re: iprop_iprop_replica_poll=2m default...)

Fri Feb 21 09:45:35 EST 2020

Is there a protocol you all follow to be safe with incremental propagation?  Like doing Full Sync, every now and then, maybe once a day or once a week?

Can the ulog or God forbid the database get clobbered corrupted and get replicated everywhere?? that would be  huge problem, but I cant imaging this to be such a high risk. 

Have backups in place of database, daily I believe. Maybe should be making hourly backups just in case. 

Pondering away… 
Tareq

> On Feb 20, 2020, at 2:04 PM, Tareq Alrashid <tareq at qerat.com> wrote:
> 
> Well!  we had a theory based on facts, that the last 1 update from the master KDC blew all other replicas out of the water! 
> So I performed a "kproplog -R” got reset log and perform a Full Sync. Restarted kprop and what do you know, the incrementals started working and kprop never crashed again!
> 
> Now we to wait on RHEL to see if coredump provides some insights on what lightening mysterious event took place for that last password change action by one insomniac at 02:11am that can cause this crash to ever happen again?
> 
> Could it be the 3 second interval poll being too frequent?...etc. we have RCA to come up with.
> 
> Shared this in case others have to deal with, or in case someone did experience this and can shed some light.
> 
> As always any and all are welcome to chime in.  
> 
> Thank you,
> Tareq
> 
>> On Feb 20, 2020, at 10:19 AM, Tareq Alrashid <tareq at qerat.com <mailto:tareq at qerat.com>> wrote:
>> 
>> RHEL 7.7 
>> 
>> 2am all of a sudden all my replicas kpropd process crashed with a core dump. Nothing has changed!  
>> 
>> systemctl status kprop.service 
>> 
>> kprop.service: main process exited, code=dumped, status=6/ABRT
>> Feb 20 09:35:13 kerb-replica systemd[1]: Unit kprop.service entered failed state.
>> Feb 20 09:35:13 kerb-replica systemd[1]: kprop.service failed.
>> 
>> 
>> kdc.conf on replica
>>       ….
>> 	iprop_port           = 754
>>         iprop_slave_poll     = 3s
>>     }
>> 
>> Any ideas would be most appreciate, as we are in the weeds now looking at what has happened.  
>> 
>> Thank you in advance
>> 
>> Tareq
>> 
>>> On Jan 12, 2020, at 9:45 PM, TAREQ ALRASHID <tareq at qerat.com <mailto:tareq at qerat.com>> wrote:
>>> 
>>> Since my last message, I came across the documentation part were it mentions the iprop_slave_poll. 
>>> We’re running Kerberos 5 release 1.15.1! - Will make the proper change and the test results should make more sense!
>>> 
>>> Thanks Greg.
>>> 
>>> 
>>> 
>>>> On Jan 12, 2020, at 5:54 PM, Greg Hudson <ghudson at mit.edu <mailto:ghudson at mit.edu>> wrote:
>>>> 
>>>> On 1/10/20 8:22 PM, Tareq Alrashid wrote:
>>>>> Maybe I am missing something but changing the kdc.conf to any value...
>>>>> 
>>>>> iprop_replica_poll=1s 
>>>>> or even...
>>>>> iprop_replica_poll   = 0.016666666666667m
>>>>> (for 1s= 1/60min!)
>>>>> 
>>>>> Based on tailing the kadmind.log, it is showing the replica polling
>>>>> every 2m!?
>>>> 
>>>> If you are running a release prior to 1.17, you need to use the old name
>>>> .  (The old name still works in 1.17 as well.)
>>>> 
>>>> Also make sure to set the value on the machine running kpropd (not the
>>>> master KDC where kadmind is run), and to restart kpropd.
>>>> 
>>>> I don't think the delta time format supports floating point values, but
>>>> "1s" or just "1" should work.
>>>> 
>>>>> Final question if there is any negative impact for having replicas poll at often as one second or maybe it is best to be at higher numbers of seconds?
>>>> Polling every second will cause a little bit of work on the replica and
>>>> the master KDC each second, and use a little bit of network traffic.
>>>> With today's computers and networks it's probably going to have much impact.
>>> 
>> 
>