Segfaults in MIT libkrb5

Thu Dec 14 18:32:38 EST 2006

On Thu, 2006-12-14 at 17:36 -0500, Ken Raeburn wrote:
> On Dec 14, 2006, at 14:25, Fredrik Tolf wrote:
> > great fandango all over libkrb5's core, they all occur in the Kerberos
> > library, in incidents seemingly related to the error tables. The usual
> > backtrace looks like this:
> > [...]
> The error code -1429577725 is PROF_NO_RELATION; in  
> krb5int_locate_server this just means it couldn't find something  
> (probably realms -> $realmname -> kdc) in the config file.  That's  
> okay.  But crashing while trying to look it up probably means a  
> corrupted error table list.

Indeed, I forgot to write that. The line in libcom_err that causes the
crash looks like (context given, line 58 is the one):

57          for (et = _et_list; et; et = et->next) {
58              if (et->table->base == table_num) {
59                  /* This is the right table */
60                  if (et->table->n_msgs <= offset)
61                      goto oops;
62                  return(et->table->msgs[offset]);

And indeed, et->table points to a non-mapped address. I don't know
exactly what is corrupted, though -- _et_list has three tables at this
point. The first one works, the next one is the one that causes the
crash, but its next one seems to be a valid link entry (since it is
readable and its next pointer points to NULL), but its table entry also
does not work (in that it too points at an unmapped address).

> This may mean the com_err library you're using isn't thread-safe.

The thing is that the ucontext threads are cooperative, so they should
never switch in an inconsistent state, AFAIK.

>  If  
> the PAM library does dlopen and dlclose on loaded modules, there may  
> also be some kind of problem in that area.

Indeed, that may be the closest I've come to the truth so far, since it
could agree with the fact that _et_list only appears to contain valid
list entries, but unmapped tables. I've have to look at that more
closely.

> > Today, however, I got another segfault, but which also seems  
> > related to
> > the error tables:
> > #0  0xb7a2ff63 in krb_realmofhost () from /usr/lib/libkrb4.so.2
> > #1  0xb7a2ffd0 in initialize_krb_error_table ()
> > from /usr/lib/libkrb4.so.2
> > #2  0xb7ba21c1 in _pam_krb5_init_ctx (ctx=0x8443e88, argc=2,
> > argv=0x8422530) at init.c:80
> > ...
> 
> But initialize_krb_error_table won't call krb_realmofhost, so clearly  
> this stack trace is misleading.  (Perhaps there's a static function  
> located after initialize_krb_error_table in the shared library.)

I also thought that was really weird! I'm re-emerging mit-krb5 right now
with debugging turned on to check it out.

> > I'm using MIT Kerberos V 1.4.3 and a system-supplied com_err library,
> > version 1.39. The system is Gentoo Linux.
> 
> I'm not familiar with that specific version of the com_err library.   
> It may not be thread-safe, or at least the interfaces we depend on  
> may not be.

I'd have liked to test without it, but the Kerberos ebuild that Gentoo
ships with don't allow that, so it would take quite some effort to do
that test. I'll do it if I can't find anything else, though.

Thanks for your reply!

Fredrik Tolf