Project review: Parallel KDC

Sun Mar 14 00:31:39 EST 2010

On Mar 13, 2010, at 22:23, Greg Hudson wrote:
> On Sat, 2010-03-13 at 20:16 -0500, Ken Raeburn wrote:
>> How does this interact with the code to reconfigure the network
>> sockets when the machine's network configuration is changed?  That
>> code works by shutting down all the network connections and listening
>> sockets, and re-doing the network configuration from scratch.
> 
> I don't see any way to make that work in combination with -w.

Some ideas had occurred to me....

The parent process could detect that reconfiguration is needed, signal the children to go away, and after they do, close down and start up again.  The signaling could be done with kill() or with EOF on a pipe.  It might be nice to let the children finish up whatever packet processing they're doing at the time, but they shouldn't stick around long (e.g., waiting on TCP traffic) if the parent is waiting for them before reconfiguring the network.

It would be a little bit more complex, but children with active TCP connections could close down the sockets listening for new connections or UDP traffic, signal the parent (EOF on a second pipe?), and continue handling the TCP connections they've got open, exiting only when those connections go away or time out.

More complex still, on some systems the parent can use something like sendmsg() to pass off a file descriptor to another process; it could pass messages to the children saying "stop listening on #4 and #7, and start listening on this one".  The benefit to this is *if* the reconfiguration code gets fixed up to touch only the file descriptors associated with new or deleted addresses, the child processes don't have to be closed down and restarted just for the sake of adding or removing one file descriptor.  Right now, probably not worth the effort...

For now, with the current state of the code, I'd suggest the first -- have the parent send SIGTERM to all the children, then wait for them to go away, reconfigure the network, and fork off a new set of child processes.

> Also, I wasn't around when that support was added, but it feels like a
> remarkable amount of effort for a regular network daemon to go to,
> suggesting that our design made a wrong turn somewhere along the line.
> (I'm not proposing to rip it out at this time, though.)

Perhaps.  We did get requests for it to continue to DTRT if network interfaces were brought up or shut down; we need to reply to UDP packets from the local address that the client sent to; and on some systems we don't get IP(V6)_PKTINFO to help us with that with a single listening socket.  With many "regular" network daemons, only TCP is used, or the client may not care about the source address of the returned packet.  Or manual restarting may be required.  For a case like ours, I'd look to something like named or ntpd; on NetBSD, at least, ntpd appears to open a routing socket.

Detecting the situation automatically and dealing may have been more work than was warranted.  We did receive a patch that would cause it to reconfigure the network on receipt of a signal, instead, but that still has the same issues with your proposal.  Before that, the only option was to kill and restart the KDC.  I kind of prefer the automatic handling, though it appears there are cases on NetBSD at least where it gets triggered needlessly; I'm still working on filtering out those cases.

Ken