Strange behavior with kadmind and incremental propagation in 1.8.3

Nico Williams nico at cryptonector.com
Mon Feb 11 21:43:10 EST 2013


On Mon, Feb 11, 2013 at 6:14 PM, John Devitofranceschi <foonon at gmail.com> wrote:
> I was hoping to get an opinion on this.  Maybe it's something that's been seen before.
>
> I see this in the kadmind log:
>
> Feb  7 17:00:10 server.realm.com kadmind[17670]: [ID 702911 local0.notice] Request: kadm5_create_principal, barney at REALM.COM, Cannot lock database, client=fred at REALM.COM, service=kadmin/server.realm.com at REALM.COM, addr=1.2.3.4

"Cannot lock database" should not happen with 1.11 or master.  It does
happen with older releases and is a serious bug that has been
discussed here.

See these commits
b2e7deb7cb2d9c37f00599778f4a102feaf6149d~4..b2e7deb7cb2d9c37f00599778f4a102feaf6149d
for more details.

(I.e., start here:
https://github.com/krb5/krb5/commit/b2e7deb7cb2d9c37f00599778f4a102feaf6149d
and look at the preceding three commits as well.)

> The create_principal failed...we even check in our provisioning client:
>
> Feb  7 17:00:39 server.realm.com kadmind[17670]: [ID 702911 local0.notice] Request: kadm5_get_principal, barney at REALM.COM, Principal does not exist, client=fred at REALM.COM, service=kadmin/server.realm.com at REALM.COM, addr=1.2.3.4

Right: the DB couldn't be locked when you attempted to create barney...

> BUT the propagation log says this:
>
> Update Entry
>        Update serial # : 150028
>        Update operation : Add
>        Update principal : barney at REALM.COM
>        Update size : 524
>        Update committed : True
>        Update time stamp : Thu Feb  7 17:00:00 2013
>        Attributes changed : 12
>                Attribute flags
>                Maximum ticket life
>                Maximum renewable life
>                Principal expiration
>                Password expiration
>                Principal
>                Key data
>                Password last changed
>                Modifying principal
>                Modification time
>                TL data
>                Length
>
> Note the timestamps...the update entry appeared 10s earlier than the kadmind log error!  There was no previous create_principal request 10s earlier.

A few things are worth noting:

a) there's an iprop bug (fixed in master) where slaves can't get
updates within ten seconds of the last transaction on the master;

b) a create of barny at REALM.COM must have succeeded at some point :)

> The fun really starts when the incremental propagation kicks in.

Oh?  That wasn't fun enough?  Bummer!

> We have 7 secondary kdc's and in the case that I am investigating; 4 of them got Update 150028 right away, on its own,  with no problems.

So far no real fun :)

> The other 3 had Update 150028 bundled with one or more additional updates and the kpropd on these three servers started core dumping over and over again until we forced a full resync.

Do you have stack traces from the cores?

> Please let me know if this is a known bug.  If it is not, maybe the code changes from 1.8 to 1.11 have made this behavior stop. It would be nice to be able to reliably reproduce this so we can then test post-1.8.3 versions of the code. So far, this has his us twice in the last couple of years.

Oh, you should *definitely* try 1.11, or even master.  1.8's iprop has
bugs for sure.  Just trawl through the git log of master.

Nico
--



More information about the Kerberos mailing list