LMDB KDB module design notes

Nathaniel McCallum npmccallum at redhat.com
Mon Apr 9 16:40:43 EDT 2018

This seems reasonable. I'm glad to see MIT considering LMDB (my
experiences with it are positive).

On Mon, Apr 9, 2018 at 10:45 AM, Greg Hudson <ghudson at mit.edu> wrote:
> I have been considering how MIT krb5 might implement an LMDB KDB
> module.
> LMDB operations take place within read or write transactions.  Read
> transactions do not block write transactions; instead, read transactions
> delay the reclamation of pages obsoleted by write transactions.  This is
> attractive for a KDB, as it means "kdb5_util dump" can take a snapshot
> of the database without blocking password changes or administrative
> operations.  (The DB2 module allows this with the "unlockiter" DB
> option, but that option carries a noticeable performance penalty, causes
> kdb5_util dump to write something which isn't exactly a snapshot, and is
> probably open to rare edge cases where an admin deletes a principal
> entry right as it's being iterated through.)
> "kdb5_util load" is our one transactional write operation.  It calls
> krb5_db_create() with the "temporary" DB option, puts principal and
> policy entries, and then calls krb5_db_promote() to make the new KBD
> visible.  The DB2 module handles this by creating side databases and
> lockfiles with a "~" extension, and then renaming them into place.  For
> this to work, each kdb_db2 operation needs to close and reopen the
> database.
> The three lockout fields of principal entries (last_success,
> last_failed, and fail_auth_count) add additional complexity.  These
> fields are updated by the KDC by default, and are not replicated in an
> iprop setup.  iprop loads include the "merge_nra" DB option when
> creating the side database, indicating that existing principal entries
> should retain their current lockout attribute values.
> Here is my general design framework, taking the above into
> consideration:
> * We use two MDB environments, setting the MDB_NOSUBDIR flag so that
>   each environment is a pair of files instead of a subdirectory:
>   - A primary environment (suffix ".mdb") containing a "policy" database
>     holding policy entries and a "principal" database holding principal
>     entries minus lockout fields.
>   - A secondary environment (suffix ".lockout.mdb") containing a
>     "lockout" database holding principal lockout fields.
>   The KDC only needs to write to the lockout environment, and can open
>   the primary environment read-only.
>   The lockout environment is never emptied, never iterated over, and
>   uses only short-lived transactions, so the KDC is never blocked more
>   than briefly.
> * For creations with the "temporary" DB option, instead of creating a
>   side database, we open or create the usual environment files, begin a
>   write transaction on the primary environment for the lifetime of the
>   database context, and open and drop the principal and policy databases
>   within that transaction.  put_principal and put_policy operations use
>   the database context write transaction instead of creating short-lived
>   ones.  When the database is promoted, we commit the write transaction
>   and the load becomes visible.
>   To maintain the low-contention nature of the lockout environment, we
>   compromise on the transactionality of load operations for the lockout
>   fields.  We do not empty the lockout database on a load and we write
>   entries to it as put_principal operations occur during the load.
>   Therefore:
>   - updates to the lockout fields become visible immediately (for
>     existing principal entries), instead of at the end of the load.
>   - updates to the lockout fields remain visible (for existing principal
>     entries) if the load operation is aborted.
>   - since we don't empty the lockout database, we leave garbage entries
>     behind for old principals which have disappeared from the dump file
>     we loaded.
>   I don't anticipate any of those behaviors being noticeable in
>   practice.  We could provide a tool to remove the garbage entries in
>   the lockout database if it becomes an issue for anyone.
> * For iprop loads, we set a context flag if we see the "merge_nra" DB
>   option at creation time.  If the context flag is set, put_principal
>   operations check for existing entries in the lockout database before
>   writing, and do nothing if an entry is already there.
> * To iterate over principals or policies, we create a read transaction
>   in the primary MDB environment for the lifetime of the cursor.  By
>   default, LMDB only allows one transaction per environment per thread.
>   This would break "kdb5_util update_princ_encryption", which does
>   put_principal operations during iteration.  Therefore, we must specify
>   the MDB_NOTLS flag in the primary environment.
>   The MDB_NOTLS flag carries a performance penalty for the creation of
>   read transactions.  To mitigate this penalty, we can save a read
>   transaction handle in the DB context for get operations, using
>   mdb_txn_reset() and mdb_txn_renew() between operations.
> * The existing in-tree KDB modules allow simultaneous access to the same
>   DB context by multiple threads, even though the KDC and kadmind are
>   single-threaded and we don't allow krb5_context objects to be used by
>   multiple threads simultaneously.  For the LMDB module, we will need to
>   either synchronize the use of transaction handles, or document that it
>   isn't thread-safe and will need mutexes added if it needs to be
>   thread-safe in the future.
> * LMDB files are capped at the memory map size, which is 10MB by
>   default.  Heimdal exposes this as a configuration option and we should
>   probably do the same; we might also want a larger default like 128MB.
>   We will have to consider how to apply any default map size to the
>   lockout environment as well as the primary environment.
> * LMDB also has a configurable maximum number of readers.  The default
>   of 126 is probably adequate for most deployments, but we again
>   probably want a configuration option in case it needs to be raised.
> * By default LMDB calls fsync() or fdatasync() for each committed write
>   transaction.  This probably overshadows the performance benefits of
>   LMDB versus DB2, in exchange for improved durability.  I think we will
>   want to always set the MDB_NOSYNC flag for the lockout environment,
>   and might need to add an option to set it for the primary environment.
> _______________________________________________
> krbdev mailing list             krbdev at mit.edu
> https://mailman.mit.edu/mailman/listinfo/krbdev

More information about the krbdev mailing list