Design for Multithreading MIT KDC


                           Rahul S.
                  [EMail: srahul@novell.com]


                        31 March 2005

1.    Need for a multithreaded KDC
      ----------------------------

   The MIT KDC server is single threaded. The requests received by the KDC are
   processed sequentially. This is not a problem since the principal
   information is stored in a local database. When the KDC is integrated with a
   directory backend (eDirectory), the Kerberos principals will be stored in
   the directory. In such a scenario, for every request, the KDC will lookup
   the directory to find Kerberos principals. Directory lookups are more time
   consuming than local database lookups. Because of this, the (single
   threaded) KDC will spend a lot of time in principal lookups and its
   performance will be hit. In order to improve the performance, the KDC has to
   process requests concurrently. This document describes a design for
   multithreading the KDC and enabling it to service KDC requests concurrently.

   The following design is applicable to version 1.3.5 of the MIT Kerberos
   distribution. But, since the KDC part of the code has not changed much from
   version 1.3.5 to 1.4, the design should be applicable to version 1.4 also
   with minor modifications.

2.    Issues that need to be addressed
      --------------------------------

   The following are some of the data structures that have to be made
   thread-safe.

2.1.  The active realm

   The 'kdc_active_realm' variable is a pointer to the realm specific data of
   the service principal corresponding to the request being processed
   currently. This variable is set just before processing an AS / TGS request.
   The members of the structure pointed at by 'kdc_active_realm' are accessed
   at many places in the KDC code while processing the packet. In a
   multithreaded scenario, each thread processing a request may need to refer
   to a different realm

2.2.  'sstate'

   The 'sstate' structure contains the set of file descriptors (FDs) that have
   to be monitored by the KDC for activity (availability of incoming /
   outgoing data). FDs are added to this set when a new (TCP) request is to be
   processed and removed when a TCP request has been processed. The TCP
   requests to the KDC may arrive in multiple TCP segments. In a multithreaded
   scenario, the following three cases are to be handled.
   1. When a new TCP connection is established, the KDC must immediately start
      listening on that FD.
   2. When a TCP connection is closed, the KDC must immediately stop listening
      on that FD.
   3. When a thread is servicing a TCP FD, no other thread must start
      servicing that FD when the next TCP segment is received on that FD.

2.3.  'connections'

   'connections' is a set of structures holding the information about each
   active FD. In a multithreaded scenario, access to this set has to be
   controlled.

2.4.  Replay cache

   'kdc_rcache' is the replay cache which stores the Kerberos authenicators of
   processed requests. When a new request is received with a cached
   authenticator, the KDC has to reject the request and return the
   KRB_AP_ERR_REPEAT error [1]. In a multithreaded scenario, the replay cache
   could be updated concurrently by two separate threads. This gives rise to
   the following two requirements.
   1. The functions modifying the replay cache data structures have to be
      thread safe.
   2. If two requests with the same authenticator are being processed
      concurrently by two separate threads, atmost one of them should succeed.

2.5.  Lookaside cache

   'root_ptr' is the (head of the list for) replay lookaside cache. This cache
   stores recently received (successfully authenticated) requests and their
   corresponding replies. It is looked up before processing a request and
   updated after processing a request. While updating, duplicates are not
   checked for. In a multithreaded scenario, this cache has to be protected in
   the following two ways.
   1. Access to this cache by multiple threads has to be controlled.
   2. Care should be taken to ensure that two threads processing the same
      request concurrently do not both update the cache.

2.6.  Global variables

   All global variables which are concurrently updated by multiple threads
   have to be protected using locks. The variables (excluding the ones
   mentioned above) in KDC are
   1. last_usec, last_os_random
   2. kdc_current_rcname
   3. hits, calls, max_hits_per_entry, num_entries
   4. tcp_data_counter, max_tcp_data_connections

   The variables in the libraries are protected as in version 1.4.
   

3.    The design
      ----------

3.1   Thread synchronization functions

   The implementation of the following functions are backported from version
   1.4.
   1. Initialization / destruction of mutexes
   2. Locking / unlocking of mutexes
   3. Creation / destruction of thread-specific data
   4. Access to thread-specific data

   These functions are used in the libraries as in version 1.4. In the KDC
   code, the following change is made. A single mutex is used for making the
   entire KDC code threadsafe. So, the mutex is redefined to be recursive. In
   order to do this, the threading library's API is used if it is supported.
   Otherwise, a recursive mutex is implemented using a normal mutex and a
   counter.

3.2.  Per-thread information

   A new structure thread_info is defined to store per-thread information.

   A new thread-specific data key 'K5_KEY_KDC' is defined. This key is
   registered during startup and used by each service thread for accessing
   thread-specific data from its thread-specific data area. The function
   'k5_getspecific' (internally 'pthread_getspecific' on linux), when called
   inside a thread, accepts the key as an argument and returns a pointer to
   the thread's thread-specific data (of type 'thread_info'). The definition
   of 'thread_info is as follows.

	struct thread_info {
		struct connection *current_conn;
		kdc_realm_t *current_realm;
	};

   The member 'current_conn' points to the connection structure over which the
   current request was received.

   The member 'current_realm' points to the active realm (intended to replace
   'kdc_active_realm')

   Every thread can use its thread_info structure at any point in its
   execution to locate the connection it is handling and the corresponding
   active realm.

   It is also possible to accomplish the same by passing the connection and
   realm information as arguments to all functions that use them. This would
   require changing the definitions of all such functions.

3.3.  kdc_active_realm

   'kdc_active_realm' is changed whenever a request is received and is
   referred to while processing the request. If multiple threads are
   processing requests concurrently, they should be able to refer to one of
   the members of the 'kdc_realmlist[]' array as the active realm. The active
   realm corresponding to the request being serviced by a thread is pointed at
   by the 'current_realm' member of the 'thread_info' structure specific to
   the thread.  The global variable 'kdc_active_realm' is removed and a new
   function 'get_active_realm()' is added. This function looks up the calling
   thread's thread-specific data and returns a pointer to the active realm.
   Macros like 'kdc_context'  refer to members of 'kdc_active_realm'. These
   macros are used in many places in the code.  In every function referring to
   these macros, a local variable called 'kdc_active_realm' is declared and
   initialized by calling 'get_active_realm()'.

3.4.  Signal handlers

   The signal handlers should be executed exactly once per signal received (in
   the context of the main thread). The handlers must not be executed in the
   context of each thread separately. To ensure this, all signals that are not
   used by the multithreading library are blocked in all threads except the
   main thread. 

3.5.  Protecting variables

   Access to non thread-safe global variables in the kdc code is protected by
   a single recursive mutex (another alternative is to have a separate mutex
   for each variable).

3.6.  Thread creation and request handling

   Currently, the function 'listen_and_process()' waits for activity on all
   incoming ports (using the 'select' system call). Whenever activity is
   detected on any incoming port, it calls service_conn() to service that
   port.

   The 'connection' structure will be modified to include the following
   members.
   1. u.udp.addr_s : Source address of a UDP request
   2. u.udp.bufsiz : Size of the buffer holding a UDP packet
   3. u.udp.buffer : Pointer to a UDP packet
   4. u.udp.msglen : Length of the UDP packet

   The use of these new members are explained below.

   'listen_and_process()' will be modified to spawn n threads as soon as it is
   called. The entry point of each thread will be the 'service_thread()'
   function.

   A global queue of instances of the following structure will be created with
   'req_port_head' as the front of the queue.

	struct req_port {
		struct req_port *next;
		struct connection *conns;
		char *prog;
		int sflags;
	};

   A monitor 'm' will control access to this queue. Whenever a new port is to
   be served, a new 'req_port' structure is created and added to the end of
   this queue after locking 'm'. The queue has a maximum length. If a request
   is to be added to the queue when the queue is full, the main thread (the
   one executing 'listen_and_process()') will block till another thread
   removes a request from the queue. If the queue is empty, all threads ready
   to process requests will wait till the main thread adds a request to the
   queue. This is implemented using 'm'. If the KDC is built with the
   multithreading option, the following algorithm will be implemented by the
   main thread.

   1. Q = {Set of KDC requests awaiting service}
   2. S := {Set of ports to poll for KDC requests. Initially, 88 (TCP and UDP)}
   3. Create n service threads.
   4. Poll S for activity using select()
   5. If select() was interrupted by a signal, go to step 4
   6. Detect activity on port p
   7. If p is a UDP port, go to step 11
   8. S := S - {p}
   9. Create a new req_port structure (req) for port p.
   10. Go to step 14
   11. Read the UDP packet, say 'pkt'
   12. Create a new 'connection' structure 'c'.
       c.u.udp.addr_s := Source address of UDP packet
       c.u.udp.buffer := pkt
       Also modify c.u.udp.msglen, c.u.udp.bufsiz
   13. Create a new req_port structure (req) for port p using 'c' in place of
       the UDP connection structure.
   14. Lock m
   15. Q := Q + {req}
   16. Unlock m
   17. Go to step 4

      The service thread :
      ------------------

   The service_thread() function will implement the following algorithm.

   1. Q = {Set of KDC requests awaiting service}
   2. S = {Set of ports being polled for KDC requests by main thread}
   3. Lock m (m is a monitor which can be locked only when the queue is non-empty)
   4. Remove a request from Q
   5. Unlock m
   6. If request is over UDP, go to step 12
   7. Read the request from the corresponding TCP port (say p)
   8. Service the request (accept_tcp_connection() / process_tcp_connection())
   9. S := S + {p}
   10. Send a signal to the main thread (to interrupt the 'select' system
       call)
   11. Go to step 3
   12. Service the request (process_packet())
   13. Go to step 3

   The 'process_packet()' function is modified to process the request in the
   buffer pointed at by a member of the 'connection' structure passed to it
   rather than read the packet from the UDP port.

   In the above algorithm, the main thread reads UDP packets from the UDP FD
   but not TCP packets. This enables the main thread to go back to listening
   on the UDP socket. In case of TCP requests, two threads must not end up
   processing the same TCP FD for two different TCP segments corresponding to
   the same request. So, the FD is not 'selected' by the main thread while it
   is being processed.

   The reason for making the main thread receive the requests over UDP is as
   follows.

   1. If the service thread receives the request, it will have to notify the
      main thread after receiving it. The main thread will then include the
      UDP port in its list of 'select'ed FDs. This would cause an extra
      context switch.
   2. The main thread could queue up more requests than there are threads when
      there is a burst of requests.

3.7.  The replay cache

   The replay cache is used for recognizing and discarding replayed KDC
   requests with the same authenticators. The replay cache functions are made
   thread safe by using locks for protecting the sensitive datastructures as
   in version 1.4.

3.8.  The lookaside cache

   The existing lookaside cache does not prevent the same requests from being
   processed concurrently by two threads (due to replay) if the second thread
   looks up the cache before the first one updates it. This problem is solved
   as follows.

   1. Lock a mutex 'm'
   2. Check lookaside cache
   3. If request is not found in the cache, go to step 8
   4. Unlock 'm'
   5. If 'reply_packet' member of the cache entry is NULL, return without any
      further processing (a concurrent thread is processing the request).
   6. Use the 'reply_packet' member as the reply and do not re-process the
      request.
   7. Return.
   8. Create a 'krb5_kdc_replay_ent' with proper values set for all members of
      the structure except 'reply_packet'. Set reply_packet := NULL
   9. Unlock 'm'
   10. Process the request.
   11. Lock 'm'
   12. Set the 'reply_packet' member of the cache entry to the proper KDC
       reply.
   13. Unlock 'm'
   14. Return

3.8.  gethostbyname()

   The gethostbyname() function returns a pointer to static data structures.
   These structures are changed on subsequent calls. The calls to these
   functions have to be protected in one of the following two ways.

   1. Protect the values returned by gethostbyname() using mutexes while they
      are being used in the calling function.

   2. If gethostbyname_r() is available, use it instead of gethostbyname().

   Though the GET_HOST_BY_NAME() macro, which refers to gethostbyname_r(), is
   available, the calling function has to ensure that the stack does not get
   corrupted by nested function calls between the invocation of
   GET_HOST_BY_NAME() and the access of the value returned by it.

4     Alternate Design Approaches Considered
      --------------------------------------

4.1   Alternate Approach 1

   In this approach, at most one thread of KDC server will be running at any
   point of time. When the thread makes a DB lookup for principal information,
   another thread can begin processing a new request. Before a new thread
   starts executing, the entire context of the previous thread (in terms of
   values of global variables) is saved. The context is restored when the
   thread making the DB lookup begins to execute again after the DB lookup has
   been completed. This approach allows a slow DB backend like a directory to
   be used with the KDC. But, the maximum rate at which requests can be
   processed by this approach is

	1/((time for processing request) - (DB lookup time))

   Hosting the KDC server on multiprocessor systems would not improve the
   performance.

4.2   Alternate Approach 2

   In this approach a new process will be forked to service every new request.
   The new process will exit after processing its request. But forking a new
   process for each new request is a costly operation compared to having an
   existing thread service the request.

4.3   Alternate Approach 3

   In this approach, n threads will be forked when the KDC server starts. Each
   thread will listen for requests using the select system call. When a
   request arrives, one of the free threads receives  and processes it.
   Synchronization of the threads here is more diffucult than in the proposed
   approach.

5.    References
      ----------

   [RFC 1510] J. Kohl and C. Neuman, "The Kerberos Network Authentication
              Service (V5)", September 1993