kadmin incremental propagation full resync multiple processes spawned

Thu Nov 3 15:58:26 EDT 2011

On Wed, Nov 2, 2011 at 5:11 PM, Paul B. Henson <henson at acm.org> wrote:
> That process gets a strange error (which I'm not sure is relevant):
>
> Nov  2 03:50:06 halfy kadmind[20238]: iprop_full_resync_1: pclose(popen)
> failed: Success

It helped find the likely culprit:

365     case 0: /* child */
366         DPRINT(("%s: run `%s' ...\n", whoami, ubuf));
367         (void) signal(SIGCHLD, SIG_DFL);
368         /* run kdb5_util(1M) dump for IProp */
369         /* XXX popen can return NULL; is pclose(NULL) okay?  */
370         pret = pclose(popen(ubuf, "w"));
371         DPRINT(("%s: pclose=%d\n", whoami, pret));
372         if (pret != 0) {
373             /* XXX popen/pclose may not set errno
374                properly, and the error could be from the
375                subprocess anyways.  */
376             if (nofork) {
377                 perror(whoami);
378             }
379             krb5_klog_syslog(LOG_ERR,
380                              _("%s: pclose(popen) failed: %s"),
381                              whoami,
382                              error_message(errno));
383             goto out;
384         }

"popen(pclose)" -- really?  ugly.  In any case, it's an error to refer
to errno when pret > 0, which is what must be the case here (in which
case errno == 0 and that's whence the "Success" string).

Anyways, this code needs some rewriting.

Looking at the log messages it seems likely that the child should have
exited after returning the reply but somehow landed back in the master
event loop, competing with the master kadmind for new requests.  Looks
like krb5_iprop_prog_1() should know to exit rather than return
(normally we don't exit() here, just exec() kprop).

In any case, the kdb5_util dump *failed* and fixing that might fix
your problem in the short term.

I see Greg has roughly the same analysis, sorry for the noise.

Nico
--