[krbdev.mit.edu #9037] Race condition in krb5_set_password()

Sumit Bose via RT rt-comment at kerborg-prod-app-1.mit.edu
Thu Nov 11 13:32:41 EST 2021


Thu Nov 11 13:32:41 2021: Request 9037 was acted upon.
 Transaction: Ticket created by sbose at redhat.com
       Queue: krb5
     Subject: Race condition in krb5_set_password()
       Owner: Nobody
  Requestors: sbose at redhat.com
      Status: new
 Ticket <URL: http://kerborg-prod-app-1.mit.edu/rt/Ticket/Display.html?id=9037 >


*Problem statement:*

Local program calls krb5_set_password() to change machine account
password.  Password change successfully done in AD, but due to race
condition in krb5_set_password(), that function returns error (status
code KRB5_KPASSWD_AUTHERROR).

Because that function returns errror, local program does not update
local store with new password.  Thus, AD has new password for machine
account (with incremented KVNO) and local store has old password (with
original KVNO).

This occurs in small percentage of the time (when the race condition hits).

*Analysis:*

There is a race during password changes if a client retransmits the request
while the server is actually working on the first request but was not able to
process the change before the hardcoded 1s timeout of the client. If a client
in an Active Directory domain tries to automatically update its machine account
password and runs into this race condition it will typically lose the access
to the domain because the client will receive an error and must assume that the
password change failed while on the server (DC) side the password was updated.
As a result the client cannot authenticate itself in the Active Directory
domain anymore and the machine account password must be reset manually.

https://krbdev.mit.edu/rt/Ticket/Display.html?id=7905 attempted to fix this by
changing the default from UDP to TCP. But there is still a fallback to UDP and
there will be even retransmits via TCP:

[3504323] 1634978406.103204: Creating authenticator for HOST$@EXAMPLE.COM -> kadmin/changepw at EXAMPLE.COM, seqnum 0, subkey aes256-cts/3997, session key aes256-cts/D6BF
[3504323] 1634978406.103206: Resolving hostname my_ad_dc.example.com
[3504323] 1634978406.103207: Initiating TCP connection to stream 10.10.10.11:464
[3504323] 1634978407.618306: Sending initial UDP request to dgram 10.10.10.11:464
[3504323] 1634978407.618307: Sending TCP request to stream 10.10.10.11:464
[3504323] 1634978407.618308: Received answer (99 bytes) from stream 10.10.10.11:464
[3504323] 1634978407.618309: Terminating TCP connection to stream 10.10.10.11:464


This can also be see in the network traffic (10.10.10.11 is the KDC and
10.10.20.22 is the client):

   83   0.836616 10.10.20.22 → 10.10.10.11  TCP 76 36556 → 464 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2699120574 TSecr=0 WS=128
   84   - unrelated -
   85   - unrelated -
   86   1.837870 10.10.20.22 → 10.10.10.11  IPv4 1516 Fragmented IP protocol (proto=UDP 17, off=0, ID=4ec8)
   87   1.867158 10.10.20.22 → 10.10.10.11  TCP 76 [TCP Retransmission] 36556 → 464 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=2699121605 TSecr=0 WS=128
   88   1.868315  10.10.10.11 → 10.10.20.22 TCP 76 464 → 36556 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1 TSval=1786871194 TSecr=2699121605
   89   1.868354 10.10.20.22 → 10.10.10.11  TCP 68 36556 → 464 [ACK] Seq=1 Ack=1 Win=29312 Len=0 TSval=2699121606 TSecr=1786871194
   90   1.868484 10.10.20.22 → 10.10.10.11  TCP 1516 [TCP segment of a reassembled PDU]
   91   1.868491 10.10.20.22 → 10.10.10.11  KPASSWD 183 Reply
   92   1.869813  10.10.10.11 → 10.10.20.22 TCP 68 464 → 36556 [ACK] Seq=1 Ack=1564 Win=263424 Len=0 TSval=1786871196 TSecr=2699121606
   93   1.870088  10.10.10.11 → 10.10.20.22 KPASSWD 171 KRB Error: KRB5KRB_AP_ERR_REPEAT
   94   1.870098 10.10.20.22 → 10.10.10.11  TCP 68 36556 → 464 [ACK] Seq=1564 Ack=104 Win=29312 Len=0 TSval=2699121608 TSecr=1786871196
   95   1.870190 10.10.20.22 → 10.10.10.11  TCP 68 36556 → 464 [FIN, ACK] Seq=1564 Ack=104 Win=29312 Len=0 TSval=2699121608 TSecr=1786871196
   96   1.870945  10.10.10.11 → 10.10.20.22 TCP 68 464 → 36556 [ACK] Seq=104 Ack=1565 Win=263424 Len=0 TSval=1786871197 TSecr=2699121608
   97   1.870999  10.10.10.11 → 10.10.20.22 TCP 62 464 → 36556 [RST, ACK] Seq=104 Ack=1565 Win=0 Len=0
   98   1.910398  10.10.10.11 → 10.10.20.22 IP 217 Bogus IP version (0)

Packet 86 is the UDP request while Packet 98 is the corresponding reply, not
sure why wireshark has issues decoding them.

The client calling krb5_set_password() will return with an error, but not with
KRB5KRB_AP_ERR_REPEAT but with KRB5_KPASSWD_AUTHERROR because the payload in
packet 93:

Kerberos
    krb-error
        pvno: 5
        msg-type: krb-error (30)
        stime: 2021-10-23 08:40:07 (UTC)
        susec: 446539
        error-code: eRR-REPEAT (34)
        realm: EXAMPLE.COM
        sname
            name-type: kRB5-NT-SRV-INST (2)
            sname-string: 2 items
                SNameString: kadmin
                SNameString: changepw
        e-data: 0003

has "e-data: 0003" which is retrieved by get_error_edata() as error code
returned to the caller.

The client now has an unclear error code triggered by the replay and has to
assume the password change failed while the server might process the initial
request and change the password.

The client might now try to get a TGT with the new key but this might fail as
well  because it is not clear how long the server might need to change the
password.

I can think of multiple ways how to solve it:
 - do not retry in libkrb5 at all
 - ignore KRB5KRB_AP_ERR_REPEAT during krb5_set_password() and wait for other
   replies from the server
 - close the initial TCP connection if no data was send before trying to send
   the request with UDP
 - longer or configurable timeouts

Please let me know if more details are needed.

I might be able to help fixing the issue if you would let me know which would
be the preferred way to solve this issue.

bye,
Sumit




More information about the krb5-bugs mailing list