LMDB KDB module design notes
ghudson at mit.edu
Sun Apr 15 11:58:05 EDT 2018
I have prototype code for this design which passes the test suite
(temporarily modified to create LMDB KDBs for Python and tests/dejagnu,
and with a few BDB-specific tests skipped). I'm working on polishing it
and adding documentation and proper tests.
On 04/10/2018 12:47 PM, Robbie Harwood wrote:
> It's hard to tell from the docs - is there a disadvantage to
> MDB_NOSUBDIR? It seems weird to have it as an option but not the
LMDB uses two files per database. By default, they have the suffixes
"/data.mdb" and "/lock.mdb"; with MDB_NOSUBDIR, they have the suffixes
"" and "-lock". The default has the advantage that it uses exactly the
directory entry given to it and no others, though it is up to the
consumer to create the directory.
>From our perspective, the main drawback of MDB_NOSUBDIR is that our
destroy method needs to just know about the MDB_NOSUBDIR suffixes in
order to clean up the files. If we used the default, we could nuke the
directory (annoyingly hard to do in C) with no special knowledge.
>> * LMDB files are capped at the memory map size, which is 10MB by
>> default. Heimdal exposes this as a configuration option and we should
>> probably do the same; we might also want a larger default like 128MB.
>> We will have to consider how to apply any default map size to the
>> lockout environment as well as the primary environment.
> What will the failure modes look like on this? Does LMDB return useful
> information around the caps?
With my prototype code, an admin would see something like:
add_principal: LMDB write failure (path: /me/krb5/build/testdir/db.mdb):
MDB_MAP_FULL: Environment mapsize limit reached while creating
"testprx2324 at KRBTEST.COM".
where the "MDB_MAP_FULL...reached" part comes from mdb_strerror(). We
could intercept MDB_MAP_FULL and say something else there.
I measured that each principal entry takes about 430 bytes in the main
environment (with the default of AES-128 and AES-256 keys, and a name
length of about 22 bytes) and about 100 bytes in the lockout
environment. With these lengths, a 128MB map size for the main
environment would accomodate around 300K principal entries. The LMDB
default of 10MB would accomodate around 25K entries.
>> * By default LMDB calls fsync() or fdatasync() for each committed
>> write transaction. This probably overshadows the performance benefits
>> of LMDB versus DB2, in exchange for improved durability. I think we
>> will want to always set the MDB_NOSYNC flag for the lockout
>> environment, and might need to add an option to set it for the primary
> Agreed. Primary will be needed, even if only for testing.
I haven't added a nosync option to my prototype code yet, and the test
suite didn't seem painfully slow using LMDB. But I will likely add it
and use it for testing anyway.
Without adding another message to the thread, I will address Andrew
Bartlett's concern about locking here:
> I just lurk here, but I have to agree with Simo here from Samba
> experience. Be very careful about lock ordering between multiple
In this design, transactions on the lockout environment are all
ephemeral, consisting of at most one get and one put. There is no
iteration over it and no need to consult the primary environment during
a lockout transaction. So I don't think deadlock is a concern.
If we ever supply a tool to collect garbage entries in the lockout
database, that tool would hold open a read transaction to iterate over
the lockout DB and do gets (to test existence) on the primary
environment as it went. But since read transactions don't block other
transactions in LMDB, there is still no deadlock risk.
More information about the krbdev