KDC worker processes project

Thu Sep 16 15:56:03 EDT 2010

On Thu, Sep 16, 2010 at 03:44:39PM -0400, ghudson at MIT.EDU wrote:
> * The supervisor is very simplistic.  It only knows how to propagate
>   HUP signals to children (all we do on a HUP right now is re-open the
>   KDC logfile), and how to terminate worker processes if the
>   supervisor receives a termination signal or any worker process
>   exits.
> 
>   (Why terminate all workers when one exits, instead of restarting the
>   worker?  So that the KDC has similar behavior on a crash with or
>   without worker processes.  The goal of this feature is to provide
>   scalability, not to increase KDC uptime in the face of bugs.)

+1 (though on Solaris we'd not need this feature -- the SMF service can
be set to restart the whole service if any of its processes core dump)

> * If you use worker processes, you don't get network reconfigs.  If
>   your platform supports IPv4 pktinfo (Linux or modern Solaris), this
>   is not a change in behavior.  If your platform does not support
>   pktinfo, then the KDC will have to be restarted in order to
>   recognize newly added network interfaces.

I'm not clear on something.  If you have pktinfo, then the kdc does cope
with changes in network interfaces, but implicitly.  Whereas if you
don't have pktinfo then changes in network interfaces have to be
detected and the KDC has to open/close some sockets.  Correct?  If so,
+1.

> * KDB modules will be closed in the supervisor just before forking and
>   then opened in each child by reinvoking initialize_realms(), which
>   will re-parse the argument list.  I don't think it's a terribly
>   clean design that we combine argument parsing and realm
>   initialization, but we have such a tangle of realm options that it's
>   not trivial to separate them out.

Oh well.

> * There won't be automated tests; it's just too hard.  There will be
>   manual tests which will involve some temporary code modifications.

Why can't you test by starting the KDC and then confirming that all the
expected processes exist, with the right parent/child hierarchy?

> Because we only added this to the 1.9 slate a few months ago when
> things had already gotten busy, implementation time is very limited.
> So while it's fair to advocate for a more complicated design, also
> consider whether the current design is better or worse than the status
> quo.

Your proposal is fine.

How will the number of processes be configured?  What's the default?
(Twice the number of CPUs seems like a reasonable default to me.)

Nico
--