[StarCluster] load balanced nodes accepting jobs before ready

Stewart, Andrew StewartA at si.edu
Mon Apr 14 14:02:20 EDT 2014


pkginstaller was called during add_node, but the node was added to the host list and its queue enabled before pkginstaller had a chance to finish installing dependencies.  So it looks like a race condition.  I did bump pkginstaller to the front of the plugins line (ahead of IPCluster) but I haven’t yet bothered to test whether that helps the situation any.    The most certain way to handle it would be to just disable the queue until provisioning is complete.

I actually think the simpler solution would be to bypass pkginstaller and just share managed packages with compute nodes via NFS.  Why reinstall the same package N times?


--
Andrew Stewart
Office of Research Information Services (ORIS),
Office of the Chief Information Officer (OCIO),
Smithsonian Institution
202-505-3633

From: Rajat Banerjee <rajatb at post.harvard.edu<mailto:rajatb at post.harvard.edu>>
Date: Monday, April 14, 2014 at 10:49 AM
To: Andrew Stewart <stewarta at si.edu<mailto:stewarta at si.edu>>
Cc: "starcluster at mit.edu<mailto:starcluster at mit.edu>" <starcluster at mit.edu<mailto:starcluster at mit.edu>>
Subject: Re: [StarCluster] load balanced nodes accepting jobs before ready

Hi,
Does that mean that the pkginstaller plugin doesn't get called during add_node ? before the host is added to the SGE host list?
Raj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20140414/eeebda83/attachment.htm


More information about the StarCluster mailing list