If you just want to restrict to a single job on each node,<div>you can write a plug in that sets the slots to 1 by using </div><div>a command something like:</div><div><br></div><div> def run(self, nodes, master, user, user_shell, volumes):</div>
<div> for node in nodes:</div><div><div> cmd_strg = 'qconf -mattr exechost complex_values slots=1 %s' % node.alias</div><div> output = master.ssh.execute(cmd_strg)</div><div><br></div>You will need to look at the starcluster plugin documentation</div>
<div>to set everything up correctly. hth.</div><div><br></div><div>Don</div><div><br></div><div><div class="gmail_quote">On Mon, Nov 21, 2011 at 10:29 AM, Rayson Ho <span dir="ltr"><<a href="mailto:raysonlogin@yahoo.com">raysonlogin@yahoo.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div><div style="color:#000;background-color:#fff;font-family:times new roman,new york,times,serif;font-size:12pt"><div>
Amir,</div><div><br></div><div>You can use qhost to list all the node and resources that each node has.</div><div><br></div><div>I have an answer to the memory issue, but I have not have time to properly type up a response and test it.<br>
</div><div><br></div><div>Rayson</div><div><br></div><div><br></div><div><br></div> <div style="font-family:times new roman,new york,times,serif;font-size:12pt"> <div style="font-family:times new roman,new york,times,serif;font-size:12pt">
<font size="2" face="Arial"> <hr size="1"> <b><span style="font-weight:bold">From:</span></b> Amirhossein Kiani <<a href="mailto:amirhkiani@gmail.com" target="_blank">amirhkiani@gmail.com</a>><br> <b><span style="font-weight:bold">To:</span></b> Justin Riley <<a href="mailto:justin.t.riley@gmail.com" target="_blank">justin.t.riley@gmail.com</a>> <br>
<b><span style="font-weight:bold">Cc:</span></b> Rayson Ho <<a href="mailto:rayrayson@gmail.com" target="_blank">rayrayson@gmail.com</a>>;
"<a href="mailto:starcluster@mit.edu" target="_blank">starcluster@mit.edu</a>" <<a href="mailto:starcluster@mit.edu" target="_blank">starcluster@mit.edu</a>> <br> <b><span style="font-weight:bold">Sent:</span></b> Monday, November 21, 2011 1:26 PM<br>
<b><span style="font-weight:bold">Subject:</span></b> Re: [StarCluster] AWS instance runs out of memory and swaps<br> </font> <br>
Hi Justin,<br><br>Many thanks for your reply.<br>I don't have any issue with multiple jobs running per node if there is enough memory for them. But since I know about the nature of my jobs, I can predict that only one per node should be running.<br>
How can I see how much memory does SGE think each node have? Is there a way to list that?<br><br>Regards,<br>Amir<br><br><br>On Nov 21, 2011, at 8:18 AM, Justin Riley wrote:<br><br>> Hi Amir,<br>> <br>> Sorry to hear you're still having issues. This is really more of an SGE<br>
> issue more than anything but perhaps Rayson can give a better insight as<br>> to what's going on. It seems you're using 23G nodes and 12GB jobs. Just<br>> for drill does 'qhost' show each node having 23GB? Definitely seems like<br>
> there's a boundary issue here given that two of your jobs together<br>> approaches the total memory of the machine (23GB). Is it your goal only<br>> to have one job per
node?<br>> <br>> ~Justin<br>> <br>> On 11/16/2011 09:00 PM, Amirhossein Kiani wrote:<br>>> Dear all, <br>>> <br>>> I even wrote the queue submission script myself, adding<br>>> the mem_free=MEM_NEEDED,h_vmem=MEM_MAX parameter but sometimes two jobs<br>
>> are randomly sent to one node that does not have enough memory for two<br>>> jobs and they start running. I think the SGE should check on the<br>>> instance memory and not run multiple jobs on a machine when the memory<br>
>> requirement for the jobs in total is above the memory available in the<br>>> node (or maybe there is a bug in the current check)<br>>> <br>>> Amir<br>>> <br>>> On Nov 8, 2011, at 5:37 PM, Amirhossein Kiani wrote:<br>
>> <br>>>> Hi Justin,<br>>>> <br>>>> I'm using a third-party tool to submit the jobs but I am setting the<br>>>> hard
limit.<br>>>> For all my jobs I have something like this for the job description:<br>>>> <br>>>> [root@master test]# qstat -j 1<br>>>> ==============================================================<br>
>>> job_number: 1<br>>>> exec_file: job_scripts/1<br>>>> submission_time: Tue Nov 8 17:31:39 2011<br>>>> owner: root<br>
>>> uid: 0<br>>>> group: root<br>>>> gid: 0<br>>>>
sge_o_home: /root<br>>>> sge_o_log_name: root<br>>>> sge_o_path: <br>>>> /home/apps/bin:/home/apps/vcftools_0.1.7/bin:/home/apps/tabix-0.2.5:/home/apps/BEDTools-Version-2.14.2/bin:/home/apps/samtools/bcftools:/home/apps/samtools:/home/apps/bwa-0.5.9:/home/apps/Python-2.7.2:/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/bin:/home/apps/sjm-1.0/bin:/home/apps/hugeseq/bin:/usr/lib64/openmpi/1.4-gcc/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/cuda/bin:/usr/local/cuda/computeprof/bin:/usr/local/cuda/open64/bin:/opt/sge6/bin/lx24-amd64:/root/bin<br>
>>> sge_o_shell: /bin/bash<br>>>> sge_o_workdir:
/data/test<br>>>> sge_o_host: master<br>>>> account: sge<br>>>> stderr_path_list: <br>>>> NONE:master:/data/log/SAMPLE.bin_aln-chr1_e111108173139.txt<br>
>>> *hard resource_list: h_vmem=12000M*<br>>>> mail_list: root@master<br>>>> notify: FALSE<br>>>> job_name: SAMPLE.bin_aln-chr1<br>
>>> stdout_path_list: <br>>>> NONE:master:/data/log/SAMPLE.bin_aln-chr1_o111108173139.txt<br>>>> jobshare:
0<br>>>> hard_queue_list: all.q<br>>>> env_list: <br>>>> job_args: -c,/home/apps/hugeseq/bin/hugeseq_mod.sh<br>>>> <a href="http://bin_sam.sh" target="_blank">bin_sam.sh</a> chr1 /data/chr1.bam /data/bwa_small.bam &&<br>
>>> /home/apps/hugeseq/bin/hugeseq_mod.sh <a href="http://sam_index.sh" target="_blank">sam_index.sh</a> /data/chr1.bam <br>>>> script_file: /bin/sh<br>>>> verify_suitable_queues: 2<br>
>>> scheduling info: (Collecting of scheduler job information<br>>>> is turned off)<br>>>> <br>>>> And I'm using the Cluster GPU Quadruple Extra Large instances which
I<br>>>> think has about 23G memory. The issue that I see is too many of the<br>>>> jobs are submitted. I guess I need to set the mem_free too? (the<br>>>> problem is the tool im using does not seem to have a way tot set that...)<br>
>>> <br>>>> Many thanks,<br>>>> Amir<br>>>> <br>>>> On Nov 8, 2011, at 5:47 AM, Justin Riley wrote:<br>>>> <br>>>>> <br>>> Hi Amirhossein,<br>>> <br>
>> Did you specify the memory usage in your job script or at command<br>>> line and what parameters did you use exactly?<br>>> <br>>> Doing a quick search I believe that the following will solve the<br>
>> problem although I haven't tested myself:<br>>> <br>>> $ qsub -l mem_free=MEM_NEEDED,h_vmem=MEM_MAX <a href="http://yourjob.sh" target="_blank">yourjob.sh</a><br>>> <br>>> Here, MEM_NEEDED and MEM_MAX are the lower and
upper bounds for your<br>>> job's memory requirements.<br>>> <br>>> HTH,<br>>> <br>>> ~Justin<br>>> <br>>> On 7/22/64 2:59 PM, Amirhossein Kiani wrote:<br>>>> Dear Star Cluster users,<br>
>> <br>>>> I'm using Star Cluster to set up an SGE and when I ran my job list,<br>>> although I had specified the memory usage for each job, it submitted<br>>> too many jobs on my instance and my instance started going out of<br>
>> memory and swapping.<br>>>> I wonder if anyone knows how I could tell the SGE the max memory to<br>>> consider when submitting jobs to each node so that it doesn't run the<br>>> jobs if there is not enough memory available on a node.<br>
>> <br>>>> I'm using the Cluster GPU Quadruple Extra Large instances.<br>>> <br>>>> Many thanks,<br>>>> Amirhossein Kiani<br>>> <br>>>>>
<br>>>> <br>>> <br>>> <br>>> <br>>> _______________________________________________<br>>> StarCluster mailing list<br>>> <a href="mailto:StarCluster@mit.edu" target="_blank">StarCluster@mit.edu</a><br>
>> <a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>> <br><br><br>_______________________________________________<br>StarCluster mailing list<br>
<a href="mailto:StarCluster@mit.edu" target="_blank">StarCluster@mit.edu</a><br><a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br><br>
<br> </div> </div> </div></div><br>_______________________________________________<br>
StarCluster mailing list<br>
<a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a><br>
<a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>
<br></blockquote></div><br></div>