[StarCluster] Starcluster - Taking advantage of multiple cores on EC2

Wed Aug 31 14:02:31 EDT 2011

Chris, thank you.

Rayson,

1) What do you get when you run "qhost" on the EC2 cluster??

error: commlib error: got select error (Connection refused)
error: unable to send message to qmaster using port 6444 on host "localhost": got send error

2) If you run your application outside of SGE on your EC2 cluster, do you get the same behavior??

If I 'python job.py' I don't see those errors...if that's what your asking?

3) Intel MKL uses OpenMP internally, did you set the env. var. OMP_NUM_THREADS on the laptop??

Nope.

Hope that may give you a lead.  I'm unfortunately a noob.

-----Original Message-----
From: Rayson Ho [mailto:raysonlogin at yahoo.com] 
Sent: Wednesday, August 31, 2011 10:57 AM
To: Bill Lennon; starcluster at mit.edu
Subject: Re: [StarCluster] Starcluster - Taking advantage of multiple cores on EC2

Bill,

1) What do you get when you run "qhost" on the EC2 cluster??

2) If you run your application outside of SGE on your EC2 cluster, do you get the same behavior??

3) Intel MKL uses OpenMP internally, did you set the env. var. OMP_NUM_THREADS on the laptop??

Rayson

=================================
Grid Engine / Open Grid Scheduler
http://gridscheduler.sourceforge.net

--- On Wed, 8/31/11, Chris Dagdigian <dag at bioteam.net> wrote:
> Grid Engine just executes jobs and manages resources.
> 
> It's up to your code to use more than one core.
> 
> Maybe there is a config difference between your local scipy/numpy etc.
> install and how StarCluster deploys it's version?
> 
> Grid Engine assumes by default a  1:1 ratio between job and CPU core 
> unless you are explicitly submitting to a parallel environment.
> 
> If you are the only user on a small cluster you probably don't have to 
> do much, the worst that could happen would be that SGE queues up and 
> runs more than one of your threaded app job on the same host and they 
> end up competing for CPU/memory resources to the detriment of all.
> 
> One way around that would be to configure exclusive job access and 
> submit your job with the "exclusive" request. That will ensure that 
> your job when it runs will get an entire execution host.
> 
> Another way is to fake up a parallel environment. For your situation 
> it is very common for people to build a parallel environment called 
> "Threaded" or "SMP" so that they can run threaded apps without 
> oversubscribing an execution host.
> 
> With a threaded PE set up you'd submit your job:
> 
>   $ qsub -pe threaded=<# CPU> my-job-script.sh
> 
> ... and SGE would account for your single job using more than one CPU 
> on a single host.
> 
> 
> FYI Grid Engine has recently picked up some Linux core binding 
> enhancements that make it easier to pin jobs and tasks to specific 
> cores. I'm not sure if the version of GE that is built into 
> StarCluster today has those features yet but it should gain them 
> eventually.
> 
> Regards,
> Chris
> 
> 
> 
> 
> 
> 
> 
> Bill Lennon wrote:
> > Dear Starcluster Gurus,
> >
> > I’ve successfully loaded the Starcluster AMI onto a
> single high-memory
> > quadruple extra large instance and am performing an
> SVD on a large
> > sparse matrix and then performing k-means on the
> result.  However, I’m
> > only taking advantage of one core when I do
> this?  On my laptop (using
> > scipy numpy, intel MKL), on a small version of this,
> all cores are taken
> > advantage of automagically.  Is there an easy way
> to do this with a
> > single starcluster instance with Atlas? Or do I need
> to explicitly write
> > my code to multithread?
> >
> > My thanks,
> >
> > Bill
> >
> _______________________________________________
> StarCluster mailing list
> StarCluster at mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>