[StarCluster] Running GPU jobs with Open Grid Scheduler (Was: CG1 plus StarCluster Questions)

Fri May 11 21:45:35 EDT 2012

Also when you run qsub to submit a job, you will need to include the
"-l" flag to request for a GPU device.

For example, to request for 1 GPU device for the job you will need:
qsub -l gpu=1 name_of_my_gpu_app.

--Chi

On Fri, May 11, 2012 at 5:45 PM, Rayson Ho <raysonlogin at yahoo.com> wrote:
> Hi Scott,
>
> You can set up a consumable resource to track usage of GPUs:
>
> http://gridscheduler.sourceforge.net/howto/consumable.html
>
> And we also have a load sensor that monitors the GPU devices:
>
> https://gridscheduler.svn.sourceforge.net/svnroot/gridscheduler/trunk/source/dist/gpu/gpu_sensor.c
>
> If you want to use the (2nd - ie. dynamic) method, then you will need to set
> it up by following this HOWTO:
>
> http://gridscheduler.sourceforge.net/howto/loadsensor.html
>
> The first method of using a consumable resource works best if you don't run
> GPU
> programs outside of Open Grid Scheduler/Grid Engine.
>
> Also note that in the next release of StarCluster GPU support will be
> enhanced.
>
> Rayson
>
> =================================
> Open Grid Scheduler / Grid Engine
> http://gridscheduler.sourceforge.net/
>
> Scalable Grid Engine Support Program
> http://www.scalablelogic.com/
>
>
> ________________________________
> From: Scott Le Grand <varelse2005 at gmail.com>
> To: starcluster at mit.edu
> Sent: Friday, May 11, 2012 5:25 PM
> Subject: [StarCluster] CG1 plus StarCluster Questions
>
> Hey guys, I'm really impressed with StarCluster and I've used it to create
> clusters ranging from 2 to 70 instances...
>
> I've also customized it to use CUDA 4.2 and 295.41, the latest toolkit and
> driver, because my code has GTX 680 support and I don't want to have to
> comment it out just to build it (and 4.1 had a horrendous perf regression).
>
> Anyway, 2 questions, one of which I think you already answered:
>
> 1. I'd like to setup a custom AMI that by default has configured 2 GPUs as a
> consumable resource.  I already have code to utilize exclusive mode and
> choose whichever GPU isn't in use in my app, but that all falls down because
> the queueing system is based on CPU cores rather than GPU count.  How would
> I set this up once so I can save the customized AMI and never have to do it
> again?
>
> 2. I'm also seeing the .ssh directories disappear on restart.  But I'll look
> at your solution as I've just been restarting the whole cluster up to now.
>
>
>
>
> _______________________________________________
> StarCluster mailing list
> StarCluster at mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>
>
>
> _______________________________________________
> StarCluster mailing list
> StarCluster at mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster
>