LMDB KDB module design notes

Robbie Harwood rharwood at redhat.com
Tue Apr 10 12:47:40 EDT 2018

Greg Hudson <ghudson at mit.edu> writes:

> Here is my general design framework, taking the above into
> consideration:
> * We use two MDB environments, setting the MDB_NOSUBDIR flag so that
> each environment is a pair of files instead of a subdirectory:
>   - A primary environment (suffix ".mdb") containing a "policy"
>   database holding policy entries and a "principal" database holding
>   principal entries minus lockout fields.
>   - A secondary environment (suffix ".lockout.mdb") containing a
>   "lockout" database holding principal lockout fields.
>   The KDC only needs to write to the lockout environment, and can open
>   the primary environment read-only.
>   The lockout environment is never emptied, never iterated over, and
>   uses only short-lived transactions, so the KDC is never blocked more
>   than briefly.

Overall this design seems good to me.

It's hard to tell from the docs - is there a disadvantage to
MDB_NOSUBDIR?  It seems weird to have it as an option but not the

> * For creations with the "temporary" DB option, instead of creating a
> side database, we open or create the usual environment files, begin a
> write transaction on the primary environment for the lifetime of the
> database context, and open and drop the principal and policy databases
> within that transaction.  put_principal and put_policy operations use
> the database context write transaction instead of creating short-lived
> ones.  When the database is promoted, we commit the write transaction
> and the load becomes visible.
>   To maintain the low-contention nature of the lockout environment, we
>   compromise on the transactionality of load operations for the
>   lockout fields.  We do not empty the lockout database on a load and
>   we write entries to it as put_principal operations occur during the
>   load.  Therefore:
>   - updates to the lockout fields become visible immediately (for
>   existing principal entries), instead of at the end of the load.
>   - updates to the lockout fields remain visible (for existing
>   principal entries) if the load operation is aborted.
>   - since we don't empty the lockout database, we leave garbage
>   entries behind for old principals which have disappeared from the
>   dump file we loaded.
>   I don't anticipate any of those behaviors being noticeable in
>   practice.  We could provide a tool to remove the garbage entries in
>   the lockout database if it becomes an issue for anyone.

The size is capped at the number of principals that have ever existed,
right?  I'm also not worried about it then.

> * The existing in-tree KDB modules allow simultaneous access to the same
>   DB context by multiple threads, even though the KDC and kadmind are
>   single-threaded and we don't allow krb5_context objects to be used by
>   multiple threads simultaneously.  For the LMDB module, we will need to
>   either synchronize the use of transaction handles, or document that it
>   isn't thread-safe and will need mutexes added if it needs to be
>   thread-safe in the future.

> * LMDB files are capped at the memory map size, which is 10MB by
> default.  Heimdal exposes this as a configuration option and we should
> probably do the same; we might also want a larger default like 128MB.
> We will have to consider how to apply any default map size to the
> lockout environment as well as the primary environment.

What will the failure modes look like on this?  Does LMDB return useful
information around the caps?

> * LMDB also has a configurable maximum number of readers.  The default
> of 126 is probably adequate for most deployments, but we again
> probably want a configuration option in case it needs to be raised.


> * By default LMDB calls fsync() or fdatasync() for each committed
> write transaction.  This probably overshadows the performance benefits
> of LMDB versus DB2, in exchange for improved durability.  I think we
> will want to always set the MDB_NOSYNC flag for the lockout
> environment, and might need to add an option to set it for the primary
> environment.

Agreed.  Primary will be needed, even if only for testing.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
Url : http://mailman.mit.edu/pipermail/krbdev/attachments/20180410/f5565028/attachment.bin

More information about the krbdev mailing list