[StarCluster] Starcluster - Taking advantage of multiple cores on EC2

Rayson Ho raysonlogin at yahoo.com
Wed Aug 31 14:22:31 EDT 2011


--- On Wed, 8/31/11, Bill Lennon <blennon at shopzilla.com> wrote:
> When I run my app interactively outside of sge and look at
> htop it only uses one core :(

OK, so at least we are not dealing with SGE... Looks like an OS/app issue now :-D

>From the shell, run:

% cat /proc/cpuinfo

or:

% cat /proc/cpuinfo|grep processor

This should at least tell us the number of cores/threads on the EC2 node.

Rayson

=================================
Grid Engine / Open Grid Scheduler
http://gridscheduler.sourceforge.net


> 
> In my original message I tried to explain that I launched
> the starcluster ami on a single ec2 instance, so I'm not
> working with a cluster.  But I'd still like to take
> advantage of all the cores.
> 
> -----Original Message-----
> From: Rayson Ho [mailto:raysonlogin at yahoo.com]
> 
> Sent: Wednesday, August 31, 2011 11:09 AM
> To: starcluster at mit.edu;
> Bill Lennon
> Subject: RE: [StarCluster] Starcluster - Taking advantage
> of multiple cores on EC2
> 
> --- On Wed, 8/31/11, Bill Lennon <blennon at shopzilla.com>
> wrote: 
> > 1) What do you get when you run "qhost" on the EC2
> cluster??
> > 
> > error: commlib error: got select error (Connection
> > refused)
> > error: unable to send message to qmaster using port
> 6444 on host 
> > "localhost": got send error
> 
> 
> Looks like you are not able to connect to the SGE
> qmaster... did you actually submit jobs to SGE??
> 
> 
> > 2) If you run your application outside of SGE on your
> EC2 cluster, do 
> > you get the same behavior??
> > 
> > If I 'python job.py' I don't see those errors...if
> that's what your 
> > asking?
> 
> 
> I mean, on one of your EC2 nodes, run your application
> interactively. Then run "top" or "uptime" and see if outside
> of SGE, your application is able to use all the cores on the
> node.
> 
> Rayson
> 
> =================================
> Grid Engine / Open Grid Scheduler
> http://gridscheduler.sourceforge.net

> 
> 
> 
> > 
> > 3) Intel MKL uses OpenMP internally, did you set the
> env.
> > var. OMP_NUM_THREADS on the laptop??
> > 
> > Nope.
> > 
> > Hope that may give you a lead.  I'm unfortunately a
> noob.
> > 
> > -----Original Message-----
> > From: Rayson Ho [mailto:raysonlogin at yahoo.com]
> > 
> > Sent: Wednesday, August 31, 2011 10:57 AM
> > To: Bill Lennon; starcluster at mit.edu
> > Subject: Re: [StarCluster] Starcluster - Taking
> advantage of multiple 
> > cores on EC2
> > 
> > Bill,
> > 
> > 1) What do you get when you run "qhost" on the EC2
> cluster??
> > 
> > 2) If you run your application outside of SGE on your
> EC2 cluster, do 
> > you get the same behavior??
> > 
> > 3) Intel MKL uses OpenMP internally, did you set the
> env.
> > var. OMP_NUM_THREADS on the laptop??
> > 
> > Rayson
> > 
> > =================================
> > Grid Engine / Open Grid Scheduler
> > http://gridscheduler.sourceforge.net

> 
> > 
> > 
> > 
> > --- On Wed, 8/31/11, Chris Dagdigian <dag at bioteam.net>
> > wrote:
> > > Grid Engine just executes jobs and manages
> resources.
> > > 
> > > It's up to your code to use more than one core.
> > > 
> > > Maybe there is a config difference between your
> local
> > scipy/numpy etc.
> > > install and how StarCluster deploys it's
> version?
> > > 
> > > Grid Engine assumes by default a  1:1 ratio
> between
> > job and CPU core
> > > unless you are explicitly submitting to a
> parallel
> > environment.
> > > 
> > > If you are the only user on a small cluster you
> > probably don't have to
> > > do much, the worst that could happen would be
> that SGE
> > queues up and
> > > runs more than one of your threaded app job on
> the
> > same host and they
> > > end up competing for CPU/memory resources to the
> > detriment of all.
> > > 
> > > One way around that would be to configure
> exclusive
> > job access and
> > > submit your job with the "exclusive" request.
> That
> > will ensure that
> > > your job when it runs will get an entire
> execution
> > host.
> > > 
> > > Another way is to fake up a parallel environment.
> For
> > your situation
> > > it is very common for people to build a parallel
> > environment called
> > > "Threaded" or "SMP" so that they can run threaded
> apps
> > without
> > > oversubscribing an execution host.
> > > 
> > > With a threaded PE set up you'd submit your job:
> > > 
> > >   $ qsub -pe threaded=<# CPU>
> my-job-script.sh
> > > 
> > > ... and SGE would account for your single job
> using
> > more than one CPU
> > > on a single host.
> > > 
> > > 
> > > FYI Grid Engine has recently picked up some Linux
> core
> > binding
> > > enhancements that make it easier to pin jobs and
> tasks
> > to specific
> > > cores. I'm not sure if the version of GE that is
> built
> > into
> > > StarCluster today has those features yet but it
> should
> > gain them
> > > eventually.
> > > 
> > > Regards,
> > > Chris
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Bill Lennon wrote:
> > > > Dear Starcluster Gurus,
> > > >
> > > > I’ve successfully loaded the Starcluster
> AMI
> > onto a
> > > single high-memory
> > > > quadruple extra large instance and am
> performing
> > an
> > > SVD on a large
> > > > sparse matrix and then performing k-means on
> the
> > > result.  However, I’m
> > > > only taking advantage of one core when I do
> > > this?  On my laptop (using
> > > > scipy numpy, intel MKL), on a small version
> of
> > this,
> > > all cores are taken
> > > > advantage of automagically.  Is there an
> easy
> > way
> > > to do this with a
> > > > single starcluster instance with Atlas? Or
> do I
> > need
> > > to explicitly write
> > > > my code to multithread?
> > > >
> > > > My thanks,
> > > >
> > > > Bill
> > > >
> > > _______________________________________________
> > > StarCluster mailing list
> > > StarCluster at mit.edu
> > > http://mailman.mit.edu/mailman/listinfo/starcluster

> 
> > > 
> > 
> > 
> 
> 




More information about the StarCluster mailing list