[kerberos-discuss] MIT krb db2 hash test failures on Solaris

Fri Jan 23 12:31:39 EST 2009

On Thu, Jan 22, 2009 at 10:57:18PM -0500, Jeffrey Hutzelman wrote:
> --On Thursday, January 22, 2009 09:44:28 PM -0600 Nicolas Williams 
> <Nicolas.Williams at sun.com> wrote:
> 
> >On Thu, Jan 22, 2009 at 10:06:02PM -0500, Jeffrey Hutzelman wrote:
> >>--On Thursday, January 22, 2009 08:42:26 PM -0600 Will Fiveash
> >><William.Fiveash at Sun.COM> wrote:
> >>> So either is the above code is making some bad assumptions or stat() is
> >>> buggy.  Thoughts?

Looking at this again my guess is that there's a hard maximum of what
the bdb2 page size can be, and that the default ZFS record size (128KB)
is larger than that max page size.

Something like that is true of SQLite3 too, for example, where the max
page size due to internal pointer sizes is 32KB.

For now try setting your dataset's recordsize to increasingly smaller
powers of two until you find one that works.  That will likely turn out
to be the largest page size for bdb2, and then you can fix bdb2 to
enforce it so that this never happens to someone else.

> >>Ugh ugh ugh.
> >>If I understand the code you've quoted correctly (without going back to
> >>read the rest of it), the bucket size, and therefore the file format,
> >>varies depending on the result from st_blksize.  This is hideous; UNIX
> >>files are flat byte streams and their format and interpretation should
> >>not  be dependent on the properties of the filesystem on which they
> >>happen to be  located.
> >
> >Actually, sizing database pages so they match the natural block size of
> >a filesystem is very important for performance.
> 
> Sure, it's one thing to choose parameters based on the filesystem, and 
> quite another for them to be defined in terms of the filesystem, such 
> that...
> 
> >then you cannot safely move
> >the database from one filesystem to another!

Well, looking at the bdb2 code in question I see that hash->hdr.bsize
is, lo and behold, in a header written to the databse on initialization
and read back on open.  So technically the code is doing the Right
Thing...

...but I suspect there's an unstated max page size that is smaller than
128KB, and, being unstated, it's not checked for.  I.e., a variant of
the old 640KB problem (who'd want block sizes larger than <pick a small
number>??).

Nico
--