kinit fails on AIX 7.1 - res_nsearch returns -1

Vipul Mehta vipulmehta.1989 at gmail.com
Tue Feb 16 11:11:43 EST 2021


Finally i am able to fix the assertion issue.
To debug further i enabled -DSHOW_INITFINI_FUNCS in CPPFLAGS to display
library initializer and finalizer calls and compared that with Linux.

AIX:
-bash-5.0$ ./kinit testuser
com_err_initialize
krb5int_thread_support_init
krb5int_lib_init
profile_library_initializer
Password for testuser at PLATFORMKRB.COM:
krb5int_thread_support_fini
com_err_terminate
The assert subroutine failed: r == 0, file ../../include/k5-thread.h, line
384
IOT/Abort trap (core dumped)

LINUX:
[devbld at psrlb14 bin]$ ./kinit testuser
com_err_initialize
krb5int_thread_support_init
krb5int_lib_init
profile_library_initializer
Password for testuser @PLATFORMKRB.COM:
gssint_mechglue_fini: skipping
krb5int_lib_fini
profile_library_finalizer
com_err_terminate
krb5int_thread_support_fini

In AIX, com_err_terminate is called after krb5int_thread_support_fini which
is not correct as com_err library depends on krb5support. Looks like order
of finalizer call is independent of library dependency order in AIX:
https://gcc.gnu.org/legacy-ml/gcc/2012-08/msg00062.html
"AIX invokes init and fini functions for multiple, dependent shared objects
breadth first"

To fix the issue, i disabled finalizer in AIX by modifying
/src/config/shlib.conf:line492
MAKE_SHLIB_COMMAND="${LDCOMBINE}

Though this will cause slight memory leak if library is loaded-unloaded
multiple times but this is not the case in our code.

There are few other fixes that I did, I will upload the modified code with
AIX build instructions on github so that it help others.

On Fri, Feb 12, 2021 at 9:38 AM Vipul Mehta <vipulmehta.1989 at gmail.com>
wrote:

> I compared behaviour in AIX and Linux.
> res_nsearch also returnd -1 in linux but it does not corrupt the call
> stack. In AIX, it corrupts the stack.
>
> Replacing res_nsearch with res_search APIs fixes stack corruption in AIX
> but later it gives assertion error:
>    pthread_kill(??, ??) at 0x900000000589014
> _p_raise(??) at 0x900000000588864
> raise.raise(??) at 0x900000000039a68
> abort() at 0x900000000056464
> __assert_c99(??, ??, ??, ??) at 0x9000000000e00c0
> threads.k5_mutex_lock(m = 0x09001000a2806a00), line 391 in "k5-thread.h"
> krb5int_key_delete(keynum = K5_KEY_COM_ERR), line 379 in "threads.c"
> com_err_terminate(), line 65 in "error_message.c"
> mod_fini1(??, ??) at 0x9fffffff000af9c
> usl_fini_mods(??, ??, ??, ??) at 0x9fffffff000bf44
> usl_exit_fini(??, ??, ??) at 0x9fffffff000aea8
> usl_exit_fini_mods(??) at 0x9fffffff000bde8
> __modfini64() at 0x900000000001110
> exit(??) at 0x900000000057050
>
> On Thu, 11 Feb, 2021, 9:58 pm Vipul Mehta, <vipulmehta.1989 at gmail.com>
> wrote:
>
>> Hi,
>>
>> When i am trying kinit in AIX 7.1, it fails with "Illegal instruction" as
>> output. core dump generated was corrupted.
>>
>> On debugging i found out that dnsglue.c -> krb5int_dns_init() -> line152
>> returns -1. Following is the call:
>> len = SEARCH(h, host, ds->nclass, ds->ntype, ds->ansp, ds->ansmax);
>>
>> SEARCH is expanded to use res_nsearch() in AIX.
>> After return statement from krb5int_dns_init(), looks like call stack
>> gets corrupted.
>>
>> To find out why res_nsearch() failed, i enabled following debug flag for
>> it and recompiled:
>> h.options |= RES_DEBUG;
>>
>> Following was the output:
>> ;; res_nquerydomain(, informatica.com, 1, 33)
>> ;; res_query(.informatica.com, 1, 33)
>> ;; res_nmkquery(QUERY, .informatica.com, IN, SRV)
>> ;; res_query: mkquery failed
>> ;; res_nquerydomain(, <Nil>, 1, 33)
>> ;; res_query(, 1, 33)
>> ;; res_nmkquery(QUERY, , IN, SRV)
>> ;; res_send()
>> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10721
>> ;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
>> ;;      ., type = SRV, class = IN
>> ;; Querying server (# 1) address = 10.23.32.61
>> server rejected query:
>> ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 10721
>> ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
>> ;;      ., type = SRV, class = IN
>> ;; Querying server (# 2) address = 10.23.32.62
>> server rejected query:
>> ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 10721
>> ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
>> ;;      ., type = SRV, class = IN
>> ;; Querying server (# 3) address = 10.1.32.114
>> server rejected query:
>> ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 10721
>> ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
>> ;;      ., type = SRV, class = IN
>> ;; res_query: send error
>>
>> errno set after this was 78 and translating it to string gives "A remote
>> host did not respond within the timeout period" which looks like misleading
>> because following AIX standalone command works fine:
>> RES_OPTIONS=debug host informatica.com
>>
>> I understand that MIT Kerberos is not supported on AIX but any pointer
>> towards solving this issue would be of great help.
>>
>> --
>> Regards,
>> Vipul
>>
>

-- 
Regards,
Vipul


More information about the krbdev mailing list