kinit fails on AIX 7.1 - res_nsearch returns -1

Vipul Mehta vipulmehta.1989 at gmail.com
Wed Feb 17 05:54:45 EST 2021


Sharing the steps followed for MIT Kerberos 1.14 build in AIX:
xlc version = 16.1.0.0

Make sure 'make' points to gnu make and not AIX make (or directly use gmake)

Set the following environment variables:
OBJECT_MODE="64"
CC="xlC_r"
CPPFLAGS="-q64"
LDFLAGS="-brtl"

Add AIX specific fixes:
1) fix the aix version in shlib.conf and disable library finalizer for AIX
as it is not executed in reverse library dependency and causes assertion.
diff -r E:\krb5-1.14/src/config/shlib.conf
E:\krb5-1.14_AIX/src/config/shlib.conf
473c473
< *-*-aix5*)
---
> *-*-aix[567]*)
492c492
< MAKE_SHLIB_COMMAND="${INIT_FINI_PREP} && ${LDCOMBINE}"
---
> MAKE_SHLIB_COMMAND="${LDCOMBINE}"

2) use res_search instead of res_nsearch as res_nsearch/res_nclose causes
stack corruption in AIX
diff -r E:\krb5-1.14/src/lib/krb5/os/dnsglue.c
E:\krb5-1.14_AIX/src/lib/krb5/os/dnsglue.c
82c82
< #elif HAVE_RES_NINIT && HAVE_RES_NSEARCH
---
> #elif HAVE_RES_NINIT && HAVE_RES_NSEARCH && !defined(_AIX)

3)
Remove the unnecessary structure tag from "struct token" as it conflicts
with <net/if_arp.h> on AIX
diff -r E:\krb5-1.14/src/lib/krb5/os/expand_path.c
E:\krb5-1.14_AIX/src/lib/krb5/os/expand_path.c
354c354
< static const struct token {
---
> static const struct {
diff -r E:\krb5-1.14/src/plugins/kdb/db2/Makefile.in
E:\krb5-1.14_AIX/src/plugins/kdb/db2/Makefile.in
3a4

4) link with the db2 library built in MIT kerberos (src/plugins/kdb/db2)
instead of system library on AIX.
> UNAME=$(shell uname)
27a29
> LDBLIBDIR=./libdb2
34c36,41
< SHLIB_EXPLIBS= $(GSSRPC_LIBS) -lkrb5 -lcom_err -lk5crypto $(KDB5_DB_LIB)
$(KADMSRV_LIBS) $(SUPPORT_LIB) $(LIBS) @DB_EXTRA_LIBS@
---
>
> ifeq ($(UNAME), AIX)
>         SHLIB_EXPLIBS= $(GSSRPC_LIBS) -lkrb5 -lcom_err -lk5crypto
$(KDB5_DB_LIB) $(KADMSRV_LIBS) $(SUPPORT_LIB) $(LIBS) -L$(LDBLIBDIR)
@DB_EXTRA_LIBS@
> else
>         SHLIB_EXPLIBS= $(GSSRPC_LIBS) -lkrb5 -lcom_err -lk5crypto
$(KDB5_DB_LIB) $(KADMSRV_LIBS) $(SUPPORT_LIB) $(LIBS) @DB_EXTRA_LIBS@
> endif
diff -r E:\krb5-1.14/src/plugins/tls/k5tls/Makefile.in
E:\krb5-1.14_AIX/src/plugins/tls/k5tls/Makefile.in
4a5,6

5) Fix thread support library linking in AIX
> UNAME=$(shell uname)
> EXTRA_PTHREAD_LIB=-lpthreads
11c13,18
< SHLIB_EXPLIBS= $(KRB5_LIB) $(SUPPORT_LIB) $(TLS_IMPL_LIBS)
---
>
> ifeq ($(UNAME), AIX)
> SHLIB_EXPLIBS= $(KRB5_LIB) $(SUPPORT_LIB) $(TLS_IMPL_LIBS)
$(EXTRA_PTHREAD_LIB)
> else
> SHLIB_EXPLIBS= $(KRB5_LIB) $(SUPPORT_LIB) $(TLS_IMPL_LIBS)
> endif

On Tue, Feb 16, 2021 at 9:41 PM Vipul Mehta <vipulmehta.1989 at gmail.com>
wrote:

> Finally i am able to fix the assertion issue.
> To debug further i enabled -DSHOW_INITFINI_FUNCS in CPPFLAGS to display
> library initializer and finalizer calls and compared that with Linux.
>
> AIX:
> -bash-5.0$ ./kinit testuser
> com_err_initialize
> krb5int_thread_support_init
> krb5int_lib_init
> profile_library_initializer
> Password for testuser at PLATFORMKRB.COM:
> krb5int_thread_support_fini
> com_err_terminate
> The assert subroutine failed: r == 0, file ../../include/k5-thread.h, line
> 384
> IOT/Abort trap (core dumped)
>
> LINUX:
> [devbld at psrlb14 bin]$ ./kinit testuser
> com_err_initialize
> krb5int_thread_support_init
> krb5int_lib_init
> profile_library_initializer
> Password for testuser @PLATFORMKRB.COM:
> gssint_mechglue_fini: skipping
> krb5int_lib_fini
> profile_library_finalizer
> com_err_terminate
> krb5int_thread_support_fini
>
> In AIX, com_err_terminate is called after krb5int_thread_support_fini
> which is not correct as com_err library depends on krb5support. Looks like
> order of finalizer call is independent of library dependency order in AIX:
> https://gcc.gnu.org/legacy-ml/gcc/2012-08/msg00062.html
> "AIX invokes init and fini functions for multiple, dependent shared
> objects breadth first"
>
> To fix the issue, i disabled finalizer in AIX by modifying
> /src/config/shlib.conf:line492
> MAKE_SHLIB_COMMAND="${LDCOMBINE}
>
> Though this will cause slight memory leak if library is loaded-unloaded
> multiple times but this is not the case in our code.
>
> There are few other fixes that I did, I will upload the modified code with
> AIX build instructions on github so that it help others.
>
> On Fri, Feb 12, 2021 at 9:38 AM Vipul Mehta <vipulmehta.1989 at gmail.com>
> wrote:
>
>> I compared behaviour in AIX and Linux.
>> res_nsearch also returnd -1 in linux but it does not corrupt the call
>> stack. In AIX, it corrupts the stack.
>>
>> Replacing res_nsearch with res_search APIs fixes stack corruption in AIX
>> but later it gives assertion error:
>>    pthread_kill(??, ??) at 0x900000000589014
>> _p_raise(??) at 0x900000000588864
>> raise.raise(??) at 0x900000000039a68
>> abort() at 0x900000000056464
>> __assert_c99(??, ??, ??, ??) at 0x9000000000e00c0
>> threads.k5_mutex_lock(m = 0x09001000a2806a00), line 391 in "k5-thread.h"
>> krb5int_key_delete(keynum = K5_KEY_COM_ERR), line 379 in "threads.c"
>> com_err_terminate(), line 65 in "error_message.c"
>> mod_fini1(??, ??) at 0x9fffffff000af9c
>> usl_fini_mods(??, ??, ??, ??) at 0x9fffffff000bf44
>> usl_exit_fini(??, ??, ??) at 0x9fffffff000aea8
>> usl_exit_fini_mods(??) at 0x9fffffff000bde8
>> __modfini64() at 0x900000000001110
>> exit(??) at 0x900000000057050
>>
>> On Thu, 11 Feb, 2021, 9:58 pm Vipul Mehta, <vipulmehta.1989 at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> When i am trying kinit in AIX 7.1, it fails with "Illegal instruction"
>>> as output. core dump generated was corrupted.
>>>
>>> On debugging i found out that dnsglue.c -> krb5int_dns_init() -> line152
>>> returns -1. Following is the call:
>>> len = SEARCH(h, host, ds->nclass, ds->ntype, ds->ansp, ds->ansmax);
>>>
>>> SEARCH is expanded to use res_nsearch() in AIX.
>>> After return statement from krb5int_dns_init(), looks like call stack
>>> gets corrupted.
>>>
>>> To find out why res_nsearch() failed, i enabled following debug flag for
>>> it and recompiled:
>>> h.options |= RES_DEBUG;
>>>
>>> Following was the output:
>>> ;; res_nquerydomain(, informatica.com, 1, 33)
>>> ;; res_query(.informatica.com, 1, 33)
>>> ;; res_nmkquery(QUERY, .informatica.com, IN, SRV)
>>> ;; res_query: mkquery failed
>>> ;; res_nquerydomain(, <Nil>, 1, 33)
>>> ;; res_query(, 1, 33)
>>> ;; res_nmkquery(QUERY, , IN, SRV)
>>> ;; res_send()
>>> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10721
>>> ;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
>>> ;;      ., type = SRV, class = IN
>>> ;; Querying server (# 1) address = 10.23.32.61
>>> server rejected query:
>>> ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 10721
>>> ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
>>> ;;      ., type = SRV, class = IN
>>> ;; Querying server (# 2) address = 10.23.32.62
>>> server rejected query:
>>> ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 10721
>>> ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
>>> ;;      ., type = SRV, class = IN
>>> ;; Querying server (# 3) address = 10.1.32.114
>>> server rejected query:
>>> ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 10721
>>> ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
>>> ;;      ., type = SRV, class = IN
>>> ;; res_query: send error
>>>
>>> errno set after this was 78 and translating it to string gives "A remote
>>> host did not respond within the timeout period" which looks like misleading
>>> because following AIX standalone command works fine:
>>> RES_OPTIONS=debug host informatica.com
>>>
>>> I understand that MIT Kerberos is not supported on AIX but any pointer
>>> towards solving this issue would be of great help.
>>>
>>> --
>>> Regards,
>>> Vipul
>>>
>>
>
> --
> Regards,
> Vipul
>


-- 
Regards,
Vipul


More information about the krbdev mailing list