[StarCluster] Starcluster - Taking advantage of multiple cores on EC2
Rayson Ho
raysonlogin at yahoo.com
Wed Aug 31 14:22:31 EDT 2011
--- On Wed, 8/31/11, Bill Lennon <blennon at shopzilla.com> wrote:
> When I run my app interactively outside of sge and look at
> htop it only uses one core :(
OK, so at least we are not dealing with SGE... Looks like an OS/app issue now :-D
>From the shell, run:
% cat /proc/cpuinfo
or:
% cat /proc/cpuinfo|grep processor
This should at least tell us the number of cores/threads on the EC2 node.
Rayson
=================================
Grid Engine / Open Grid Scheduler
http://gridscheduler.sourceforge.net
>
> In my original message I tried to explain that I launched
> the starcluster ami on a single ec2 instance, so I'm not
> working with a cluster. But I'd still like to take
> advantage of all the cores.
>
> -----Original Message-----
> From: Rayson Ho [mailto:raysonlogin at yahoo.com]
>
> Sent: Wednesday, August 31, 2011 11:09 AM
> To: starcluster at mit.edu;
> Bill Lennon
> Subject: RE: [StarCluster] Starcluster - Taking advantage
> of multiple cores on EC2
>
> --- On Wed, 8/31/11, Bill Lennon <blennon at shopzilla.com>
> wrote:
> > 1) What do you get when you run "qhost" on the EC2
> cluster??
> >
> > error: commlib error: got select error (Connection
> > refused)
> > error: unable to send message to qmaster using port
> 6444 on host
> > "localhost": got send error
>
>
> Looks like you are not able to connect to the SGE
> qmaster... did you actually submit jobs to SGE??
>
>
> > 2) If you run your application outside of SGE on your
> EC2 cluster, do
> > you get the same behavior??
> >
> > If I 'python job.py' I don't see those errors...if
> that's what your
> > asking?
>
>
> I mean, on one of your EC2 nodes, run your application
> interactively. Then run "top" or "uptime" and see if outside
> of SGE, your application is able to use all the cores on the
> node.
>
> Rayson
>
> =================================
> Grid Engine / Open Grid Scheduler
> http://gridscheduler.sourceforge.net
>
>
>
> >
> > 3) Intel MKL uses OpenMP internally, did you set the
> env.
> > var. OMP_NUM_THREADS on the laptop??
> >
> > Nope.
> >
> > Hope that may give you a lead. I'm unfortunately a
> noob.
> >
> > -----Original Message-----
> > From: Rayson Ho [mailto:raysonlogin at yahoo.com]
> >
> > Sent: Wednesday, August 31, 2011 10:57 AM
> > To: Bill Lennon; starcluster at mit.edu
> > Subject: Re: [StarCluster] Starcluster - Taking
> advantage of multiple
> > cores on EC2
> >
> > Bill,
> >
> > 1) What do you get when you run "qhost" on the EC2
> cluster??
> >
> > 2) If you run your application outside of SGE on your
> EC2 cluster, do
> > you get the same behavior??
> >
> > 3) Intel MKL uses OpenMP internally, did you set the
> env.
> > var. OMP_NUM_THREADS on the laptop??
> >
> > Rayson
> >
> > =================================
> > Grid Engine / Open Grid Scheduler
> > http://gridscheduler.sourceforge.net
>
> >
> >
> >
> > --- On Wed, 8/31/11, Chris Dagdigian <dag at bioteam.net>
> > wrote:
> > > Grid Engine just executes jobs and manages
> resources.
> > >
> > > It's up to your code to use more than one core.
> > >
> > > Maybe there is a config difference between your
> local
> > scipy/numpy etc.
> > > install and how StarCluster deploys it's
> version?
> > >
> > > Grid Engine assumes by default a 1:1 ratio
> between
> > job and CPU core
> > > unless you are explicitly submitting to a
> parallel
> > environment.
> > >
> > > If you are the only user on a small cluster you
> > probably don't have to
> > > do much, the worst that could happen would be
> that SGE
> > queues up and
> > > runs more than one of your threaded app job on
> the
> > same host and they
> > > end up competing for CPU/memory resources to the
> > detriment of all.
> > >
> > > One way around that would be to configure
> exclusive
> > job access and
> > > submit your job with the "exclusive" request.
> That
> > will ensure that
> > > your job when it runs will get an entire
> execution
> > host.
> > >
> > > Another way is to fake up a parallel environment.
> For
> > your situation
> > > it is very common for people to build a parallel
> > environment called
> > > "Threaded" or "SMP" so that they can run threaded
> apps
> > without
> > > oversubscribing an execution host.
> > >
> > > With a threaded PE set up you'd submit your job:
> > >
> > > $ qsub -pe threaded=<# CPU>
> my-job-script.sh
> > >
> > > ... and SGE would account for your single job
> using
> > more than one CPU
> > > on a single host.
> > >
> > >
> > > FYI Grid Engine has recently picked up some Linux
> core
> > binding
> > > enhancements that make it easier to pin jobs and
> tasks
> > to specific
> > > cores. I'm not sure if the version of GE that is
> built
> > into
> > > StarCluster today has those features yet but it
> should
> > gain them
> > > eventually.
> > >
> > > Regards,
> > > Chris
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Bill Lennon wrote:
> > > > Dear Starcluster Gurus,
> > > >
> > > > I’ve successfully loaded the Starcluster
> AMI
> > onto a
> > > single high-memory
> > > > quadruple extra large instance and am
> performing
> > an
> > > SVD on a large
> > > > sparse matrix and then performing k-means on
> the
> > > result. However, I’m
> > > > only taking advantage of one core when I do
> > > this? On my laptop (using
> > > > scipy numpy, intel MKL), on a small version
> of
> > this,
> > > all cores are taken
> > > > advantage of automagically. Is there an
> easy
> > way
> > > to do this with a
> > > > single starcluster instance with Atlas? Or
> do I
> > need
> > > to explicitly write
> > > > my code to multithread?
> > > >
> > > > My thanks,
> > > >
> > > > Bill
> > > >
> > > _______________________________________________
> > > StarCluster mailing list
> > > StarCluster at mit.edu
> > > http://mailman.mit.edu/mailman/listinfo/starcluster
>
> > >
> >
> >
>
>
More information about the StarCluster
mailing list