Error while compiling krb 1.5

Sun Jul 9 15:32:48 EDT 2006

On Sat, Jul 08, 2006 at 10:36:13PM -0400, Marcus Watts wrote:
> The e2fsprogs compile_et won't produce tables that work with
> openafs, because the error_message that's built into libafsrpc.so
> (so not easily removable) doesn't export _et_list.
> Sigh.

And the problem is that older _et libraries don't export the
add_error_table() interface which was originally added by the MIT
Kerberos library (which to be fair is really needed in order to
support mutexes cleanly).  Sigh.  

> Heimdal's error_message() handles "non-standard" bases.  That is,
> a table for which et->table->base is not 0 mod 256.
> I don't know if this was intentional on their part,
> but it's certainly useful and causes no harm.

Can you send me a sample .et file and what the resulting .c and .h
files should look like (for a regression test suite).  I assume it is
just

	base <number>

and it must be after the "error_table" declaration, but before any
"error_code" lines?  (or else the behaviour is undefined?).  One of
the problems is that not all of the extensions have been necessarily
explicitly defined with the backwards compatibility statements defined
and coordinated amongst all users.

> Another AFS complication is that one of those ranges starts at 101.
> That's a problem for deciding when to use strerror(), so I prefer
> to search the error table list first, then default to using strerror().
> I'm not sure it's safe to assume system errors are in table 0 anyways,
> or that I want system errors to override explicitly provided tables.

That makes sense, and sounds reasonable.

> > This is *not* the case with the MIT Kerberos implementation, which has
> > more lax compatibility requirements, at least at one point, with some
> > resulting entertaining problems that were reported with Debian, that
> > tries to make this whole thing work together.
> 
> The debian "testing" system I have seems to have exactly one libcom_err,
> and kerberos links against that.  I actually like that.

Yep, and debian testing uses the com_err library from e2fsprogs.  One
of the things I had to fix was compatibilty with compile_et programs
from other programs, and I also cared about backwards compatibility
with older Debian releases.  That's one of the reasons why compile_et
generates initialize_XXX_error_table() functions which manipulate
_et_list directly.  If I were to change it to use the
add_error_table() interface, it would break backwards compatibility
with older com_err shared libraries.  I don't feel it's right to do
that without forcing a major version number bump on the com_err shared
library.  (I don't know if MIT Kerberos bothered to bump the shared
version number, but if we are going to do this right, it would be nice
to coordinate major version numbers and carefully define what
interfaces are guaranteed to exist as of that shared library major
version.)

In any case, I took great care in adding as much backwards
compatibility as I couldl, *because* I wanted Debian to be able to use
a single shared library that was compatible across multiple versions
of MIT and Heimdall Kerberos, as well as the various other users and
(in some cases, providers) of either the com_err library or the
compile_et program.

>(Is is really possible to mount e2fs under solaris today?)

I thought I had seen a port of ext2fs for Solaris, but I could be
mistaken.  But given OpenSolaris and Sun's business strategy of
abandoning the high-margin sparc architecture for the low-margin AMD
architecture, if ext2/3 support doesn't exist today, it will soon.
Given the GPL incompatible license that Solaris is under, I assume
that if the power would probably come from the *BSD kernel code.  

The original reason why I had done the Solaris port was so I could
take advantage of purify to find memory leak problems in e2fsprogs
(this was before the days of valgrind).  Then later, Solaris folks
were depending on e2fsprogs because GNOME was using the uuid library.
So the Solaris port of e2fsprogs was the older platform that e2fsprogs
had been ported to, as it turns out.

> There is one other reason you might *like* thread-aware behavior,
> and that is dealing with the "unknown error" case.  Using "static char
> buffer[256];" creates the possibility that another thread might dive
> in and replace the text before some previous thread succeeds in
> using the error text.  More bizarre interactions are possible with
> printf() which might behave oddly when printing strings in fixed-width
> fields if the string changes length.

I hesitate to add yet another interface, as it would provide another
Yet Another Possible Incompatibility Problem, but perhaps we should
consider adding a error_message_r() interface where the userspace
program can provide its own buffer for the "unknown error" case?

> I believe all of your named os targets support pthreads, but I don't know
> how many support weak references or otherwise try to make it possible
> for one library to be used both by pthreads and non-pthreads
> applications.  A more portable solution would probably be to provide a
> separate "-lcom_err_r" that explicitly provides pthread semantics.

Yup, that's the problem.  One of OS's which I haven't mentioned which
is a major headache is AIX, since it's shared library implementation
is just about different from all other Unix systems.  (Sigh....  "AIX
--- it *reminds* you of Unix".  But since I work for IBM these days, I
guess I really do have to pay attention to it.)

If we allow into the solution space requiring changes to threaded
programs that want to use com_err, there are a number of things we
could do beyond creating a separate com_err_r library.  For example,
we could simply require that threaded programs call a function,
com_err_enable_pthread().  On a static library, it would drag in the
the .o file which contains the calls to the pthread library, and in
both cases, it would fill in function pointers that _et_lock() and
_et_unlock() function that would provide the appropriate locking
functionality.

Personally I think that's probably the better approach, but then again
I try to avoid threaded programs like the plague.  :-)

> I agree, automake & libtool aren't at all pretty.  I'm currently
> experimenting with using a configure script written using "perl" in
> place of autoconf & friends.  I'm not sure this is really the right way
> to go, but I'm convinced it sucks less than automake+libtool.  Perl
> *is* available on nearly all modern platforms, is often part of the
> core distribution, and as it happens, most perls do know how to make
> shared objects for the local environment.

I don't mind autoconf that much, but as far as I'm concerned automake
(all four incompatible versions which Debian has to supply) and
libtool are abominations....  My main complaints about libtool is that
a separate shell script has to be run for every single .c file that
you want compile (this is a performance problem that could be fixed)
and that libtools interface isn't documented besides "use automake".

My one hope is that when the FSF switches all of their programs to
GPLv3, which will make it just as incompatible with all of the GPLv2
programs as Solaris's CDDL, that people will abandon libtool and
automake, but that seems to be an unrealistic hope.....

						- Ted