Multithreading the KDC server - a design

Wed Apr 6 00:32:39 EDT 2005

Hi Ken,
        Thanks for your comments.

> First, you're talking a lot about 1.3.5.

The reason I am talking about version 1.3.5 is that at Novell we are
working on the 1.3.5 codebase to integrate it with eDirectory. The
development work started before the release of version 1.4. So, we have
stuck to version 1.3.5. But, our goal is to port the changes to version 
1.4 before submitting the patch.

> A bunch of changes went into
> library code in 1.4 to make it thread-safe, so 1.4 would be a much
> better starting point for the work.  Otherwise you wind up having to
> pull in a lot of the changes between 1.3.5 and 1.4 anyways, or redo a
> bunch of the work.

I have implemented a prototype of the proposed design. I have pulled in 
multi-threading code from version 1.4 into 1.3.5. The changes to the 
library code have also been back-ported.

> You don't indicate whether the thread interface you use would be an
> extension of what I've done so far (which may be a bit complicated to
> sort through at first glance, I admit), or directly using the POSIX
> interfaces.  The former would make it easier for somebody to someday
> port the KDC to Windows, I guess, but extensions would need to be made
> (e.g., thread creation and monitors are outside of its current scope),
> and the POSIX approach is likely to get done a lot faster.

The thread interface used in the KDC would be an extension of the 
interface available in version 1.4. The following are the additional APIs 
that would be used.
1. Thread creation - pthread_create()
2. Recursive mutex - code would be added to KDC for implementing this. (Or
    would it be better to add to the k5-threads.h ?)
3. The monitor used for managing the request queue can be implemented
    using mutexes and condition variables. The thread interface in version
    1.4 does not include condition variable operations. This would be added
    to the KDC code (Or should the thread interface be extended).

> 2.4, 2.5: It's also possible in 1.4 for the replay cache and lookaside
> cache to be turned off.

This would require some additional work while porting the changes from 
1.3.5 to 1.4.

> 3.1: Again, which thread interface?  The shim layer I wrote does not
> support recursive mutexes.

The following is how recursive mutexes would be implemented.
1. If PTHREAD_MUTEX_RECURSIVE_NP is defined (it is defined in pthread.h on
    linux), the pthread_mutex_t member of the k5_mutex_t would be
    re-initialized to be a recursive mutex. Since pthread.h is present, I
    am assuming that mutex.os.p exists and is of type pthread_mutex_t.
2. If PTHREAD_MUTEX_RECURSIVE_NP is not defined, a recursive mutex is
    implemented using a mutex 'm' and a thread-specific counter 'c' as
    follows.

    initial state:
 	m is unlocked
 	c == 0 for all threads

    LOCK:
    1. If (current thread's) c != 0, go to 3
    2. lock(m)
    3. c = c + 1

    UNLOCK:
    1. If (current thread's) c > 1, go to 3
    2. unlock (m)
    3. c = c - 1

> 3.4: The variables set by the signal handlers need to be changed in a
> thread-safe way, and the code checking it should probably delay
> operations (at least the re-opening of the log file) until other
> threads are idle, or at least can't be logging anything.

As stated in section (3.4), all signals (including INT, HUP, TERM) are
blocked in the 'service threads'. So, the signal handlers would always be
executed in the context of the main thread.

   I had overlooked the issue of re-opening of the log file. This needs to 
be addressed.

> (If the
> database access is doing something that would be bad to exit in the
> middle of, then perhaps exiting should be delayed too.  On the other
> hand, we don't want a process to hang around indefinitely, ignoring
> signals telling it to exit, just because some database query is
> hanging.)

In the prototype that I have implemented, the main thread waits for all 
service threads to exit before exiting itself. So, the problem arising out 
of a database query hanging does exist. It has to be addressed.

> 3.8: Where is gethostbyname called in the KDC?

'gethostbyname()' is used in the function krb5_os_localaddr() (in
lib/krb5/os/localaddr.c). Though it is not called by the KDC, the symbol 
krb5_os_localaddr() is present in 'krb5kdc'. I guess this could be 
ignored.

>> 3) Section 3.1 -- why is the single mutex to be recursive? The fact
>> that there is proposed to be one single mutex does not lead, in my
>> mind, to the conclusion that this mutex be recursive. Forgive me for
>> saying so, but often the perceived need for a recursive mutex is a
>> byproduct of an incomplete design. There are times when they are a
>> great choice, and this may be one of them -- but the document does not
>> explain why. I think the choice of a recursive mutex needs
>> explanation.
>
> I'm inclined to agree here.  Not so much from experience, as from
> principles; there are probably cases where a recursive mutex makes
> sense, but as it can also encourage sloppiness or at least make missing
> unlock calls harder to spot, I think it needs justification.

The reason for using recursive mutexes is as follows. Consider the 
following three functions

f1(){
   ...
   b = 2;
   ...
}

f2(){
   a = 1;
   f1();
}

f3(){
   f1();
}

The updating of (a, b) needs to be atomic. In order to ensure this 
atomicity, locking can be done in two ways.
1. f1(){
      ...
      b = 2;
      ...
    }

    f2(){
      LOCK;
      a = 1;
      f1();
      UNLOCK;
    }

    f3(){
      UNLOCK;
      f1();
      UNLOCK;
    }

2. f1(){
      ...
      LOCK; //recursive
      b = 2;
      UNLOCK; //recursive
      ...
    }

    f2(){
      LOCK; //recursive
      a = 1;
      f1();
      UNLOCK; //recursive
    }

    f3(){
      f1();
    }

The advantages of the second type of locking are.
1. The critical-section size is smaller (note the extra lines around
    'b=2' and the function call itself)
2. Ensuring that a variable is threadsafe is easier. We don't have to
    look at the functions calling f1() to find out if 'b' is threadsafe.

In the KDC code, such atomic operations on global variables would be 
required in a few places if the server is made multithreaded. For example, 
in accept_tcp_connection(), when tcp_data_counter > 
max_tcp_data_connections, the following two actions have to be performed 
atomically.
1. looking up 'connections' for the oldest connection (inside
    accept_tcp_connection())
2. deleting it from 'connections' (inside kill_tcp_connection())
Otherwise, if multiple threads choose the same connection for deletion, 
one of them could end up with an invalid handler when the other deletes 
it.

   It is possible to do the same using non-recursive mutexes, but it would 
need rewriting some of the code to group multiple accesses to global 
variables into a single function. But I feel that for an initial 
implementation, adding lock and unlock calls to existing code would be 
simpler.

   Also, missing unlock calls would not be hard to spot if a simple rule is 
followed - "locks are released in the function in which they are 
acquired".

>> 5) Section 3.5 mentions that an alternative is to have separate
>> mutexes for each global variable. In general this is very attractive
>> to me! More locks (i.e., fine-grained locks) usually yields higher
>> concurrency and higher throughput.
>>
>> In the extreme, however, one would probably NOT want separate locks
>> for EACH global variable. If two or more global variables' values need
>> to be coordinated, then a common lock for that set of related globals
>> would be appropriate. E.g., queue A and counter A could be governed by
>> lock A, while queue B and counter B could be governed by lock B
>> (assuming that the two queues did not need to be synchronized).
>
> Some of them appear to be recording statistics from the replay cache
> which are never reported.  So, they should be considered protected by
> the same mutex as the replay cache itself.  (Though perhaps we should
> either eliminate them or find a way to report them.)  It appears that
> kdc_current_rcname is not settable, so it could probably be dropped.
> Et cetera...
>
> (Minor aside: For cases where a variable is to be deleted, getting that
> as a small, separate patch would be nice.  But whatever...)

The main reason for using a single (recursive) lock instead of multiple 
locks is that deadlocks would not be an issue, though it does restrict 
the concurrency. As Sam pointed out, additional locks could be added in 
the future if required.

>> 8) Still in Section 3.6, in the algorithm for the default thread --
>> does step 16 (unlock m) trigger any sort of notification for waiting
>> service threads? Default thread step 15 added work to "Q", so I
>> presume that step 16's unlock of m is intended to pro-actively notify
>> any service threads that were waiting for something to appear in Q.
>> Without active notification (e.g., use a pthread condition variable),
>> the service threads may not wake up at all, or at least in a timely
>> manner.
>
> I assumed that was implied by describing 'm' as a monitor, but maybe
> not...

Yes, notification of waiting threads would be part of unlocking of 'm'.

-Regards,

-Rahul S.