[StarCluster] loadbalance error

Justin Riley jtriley at MIT.EDU
Wed Jan 11 17:49:24 EST 2012


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Wei,

How long did you run the loadbalancer before you got the error? This
is clearly a memory-leak issue as you can see by the exception raised
(MemoryError). I'll have to look closer to be sure but it appears that
the balancer just endlessly appends to a list which eventually causes
the load balancer process to run out of memory.

I've filed an issue on github for this to keep track:

https://github.com/jtriley/StarCluster/issues/65

~Justin


On 01/11/2012 10:01 AM, Wei Tao wrote:
> Hi all,
> 
> I was running loadbalance. After a while, I got the following
> error. Can someone shed some light on this? This happened before
> with earlier versions of Starcluster as well.
> 
>>>> Loading full job history
> !!! ERROR - command 'source /etc/profile && qhost -xml' failed with
> status 1 Traceback (most recent call last): File 
> "/usr/local/lib/python2.6/dist-packages/StarCluster-0.93-py2.6.egg/starcluster/cli.py",
>
> 
line 251, in main
> sc.execute(args) File 
> "/usr/local/lib/python2.6/dist-packages/StarCluster-0.93-py2.6.egg/starcluster/commands/loadbalance.py",
>
> 
line 89, in execute
> lb.run(cluster) File 
> "/usr/local/lib/python2.6/dist-packages/StarCluster-0.93-py2.6.egg/starcluster/balancers/sge/__init__.py",
>
> 
line 583, in run
> if self.get_stats() == -1: File 
> "/usr/local/lib/python2.6/dist-packages/StarCluster-0.93-py2.6.egg/starcluster/balancers/sge/__init__.py",
>
> 
line 529, in get_stats
> self.stat.parse_qhost(qhostxml) File 
> "/usr/local/lib/python2.6/dist-packages/StarCluster-0.93-py2.6.egg/starcluster/balancers/sge/__init__.py",
>
> 
line 49, in parse_qhost
> doc = xml.dom.minidom.parseString(string) File
> "/usr/lib/python2.6/xml/dom/minidom.py", line 1928, in parseString 
> return expatbuilder.parseString(string) File
> "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 940, in 
> parseString return builder.parseString(string) File
> "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 223, in 
> parseString parser.Parse(string, True) ExpatError: syntax error:
> line 1, column 0
> 
> ---------------------------------------------------------------------------
>
> 
MemoryError                               Traceback (most recent call last)
> 
> /usr/local/bin/starcluster in <module>() 7 if __name__ ==
> '__main__': 8     sys.exit( ----> 9
> load_entry_point('StarCluster==0.93', 'console_scripts', 
> 'starcluster')() 10     ) 11
> 
> /usr/local/lib/python2.6/dist-packages/StarCluster-0.93-py2.6.egg/starcluster/cli.pyc
>
> 
in main()
> 306     logger.configure_sc_logging() 307
> warn_debug_file_moved() --> 308     StarClusterCLI().main() 309 310
> if __name__ == '__main__':
> 
> /usr/local/lib/python2.6/dist-packages/StarCluster-0.93-py2.6.egg/starcluster/cli.pyc
>
> 
in main(self)
> 283             log.debug(traceback.format_exc()) 284
> print --> 285             self.bug_found() 286 287
> 
> /usr/local/lib/python2.6/dist-packages/StarCluster-0.93-py2.6.egg/starcluster/cli.pyc
>
> 
in bug_found(self)
> 150         crashfile = open(static.CRASH_FILE, 'w') 151
> crashfile.write(header % "CRASH DETAILS") --> 152
> crashfile.write(session.stream.getvalue()) 153
> crashfile.write(header % "SYSTEM INFO") 154
> crashfile.write("StarCluster: %s\n" % __version__)
> 
> /usr/lib/python2.6/StringIO.pyc in getvalue(self) 268         """ 
> 269         if self.buflist: --> 270             self.buf +=
> ''.join(self.buflist) 271             self.buflist = [] 272
> return self.buf
> 
> MemoryError:
> 
> 
> Thanks!
> 
> -Wei
> 
> 
> This body part will be downloaded on demand.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk8OEfQACgkQ4llAkMfDcrm5hACdH3Lu7/h2ef1VQ4lXf3oRcxLK
yQgAn2snn/KkJR9n/aqf7wPhIyw++pu+
=sWbl
-----END PGP SIGNATURE-----


More information about the StarCluster mailing list