[StarCluster] load balanced nodes accepting jobs before ready
Stewart, Andrew
StewartA at si.edu
Mon Apr 14 14:02:20 EDT 2014
pkginstaller was called during add_node, but the node was added to the host list and its queue enabled before pkginstaller had a chance to finish installing dependencies. So it looks like a race condition. I did bump pkginstaller to the front of the plugins line (ahead of IPCluster) but I haven’t yet bothered to test whether that helps the situation any. The most certain way to handle it would be to just disable the queue until provisioning is complete.
I actually think the simpler solution would be to bypass pkginstaller and just share managed packages with compute nodes via NFS. Why reinstall the same package N times?
--
Andrew Stewart
Office of Research Information Services (ORIS),
Office of the Chief Information Officer (OCIO),
Smithsonian Institution
202-505-3633
From: Rajat Banerjee <rajatb at post.harvard.edu<mailto:rajatb at post.harvard.edu>>
Date: Monday, April 14, 2014 at 10:49 AM
To: Andrew Stewart <stewarta at si.edu<mailto:stewarta at si.edu>>
Cc: "starcluster at mit.edu<mailto:starcluster at mit.edu>" <starcluster at mit.edu<mailto:starcluster at mit.edu>>
Subject: Re: [StarCluster] load balanced nodes accepting jobs before ready
Hi,
Does that mean that the pkginstaller plugin doesn't get called during add_node ? before the host is added to the SGE host list?
Raj
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20140414/eeebda83/attachment.htm
More information about the StarCluster
mailing list