<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Dear all,&nbsp;<div><br></div><div>I even wrote the queue submission script myself, adding the&nbsp;mem_free=MEM_NEEDED,h_vmem=MEM_MAX parameter but sometimes two jobs are randomly sent to one node that does not have enough memory for two jobs and they start running. I think the SGE should check on the instance memory and not run multiple jobs on a machine when the memory requirement for the jobs in total is above the memory available in the node (or maybe there is a bug in the current check)</div><div><br></div><div>Amir<br><div><div><br><div><div>On Nov 8, 2011, at 5:37 PM, Amirhossein Kiani wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi Justin,<div><br></div><div>I'm using a third-party tool to submit the jobs but I am setting the hard limit.</div><div>For all my jobs I have something like this for the job description:</div><div><br></div><div><div>[root@master test]# qstat -j 1</div><div>==============================================================</div><div>job_number: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1</div><div>exec_file: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;job_scripts/1</div><div>submission_time: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Tue Nov &nbsp;8 17:31:39 2011</div><div>owner: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;root</div><div>uid: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0</div><div>group: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;root</div><div>gid: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0</div><div>sge_o_home: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; /root</div><div>sge_o_log_name: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; root</div><div>sge_o_path: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; /home/apps/bin:/home/apps/vcftools_0.1.7/bin:/home/apps/tabix-0.2.5:/home/apps/BEDTools-Version-2.14.2/bin:/home/apps/samtools/bcftools:/home/apps/samtools:/home/apps/bwa-0.5.9:/home/apps/Python-2.7.2:/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/bin:/home/apps/sjm-1.0/bin:/home/apps/hugeseq/bin:/usr/lib64/openmpi/1.4-gcc/bin:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/cuda/bin:/usr/local/cuda/computeprof/bin:/usr/local/cuda/open64/bin:/opt/sge6/bin/lx24-amd64:/root/bin</div><div>sge_o_shell: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/bin/bash</div><div>sge_o_workdir: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/data/test</div><div>sge_o_host: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; master</div><div>account: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;sge</div><div>stderr_path_list: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; NONE:master:/data/log/SAMPLE.bin_aln-chr1_e111108173139.txt</div><div><b>hard resource_list: &nbsp; &nbsp; &nbsp; &nbsp; h_vmem=12000M</b></div><div>mail_list: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;root@master</div><div>notify: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; FALSE</div><div>job_name: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; SAMPLE.bin_aln-chr1</div><div>stdout_path_list: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; NONE:master:/data/log/SAMPLE.bin_aln-chr1_o111108173139.txt</div><div>jobshare: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0</div><div>hard_queue_list: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;all.q</div><div>env_list: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;</div><div>job_args: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -c,<span class="Apple-tab-span" style="white-space:pre">                </span>/home/apps/hugeseq/bin/hugeseq_mod.sh bin_sam.sh chr1 /data/chr1.bam /data/bwa_small.bam &amp;&amp; <span class="Apple-tab-span" style="white-space:pre">                </span>/home/apps/hugeseq/bin/hugeseq_mod.sh sam_index.sh /data/chr1.bam&nbsp;</div><div>script_file: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/bin/sh</div><div>verify_suitable_queues: &nbsp; &nbsp; 2</div><div>scheduling info: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;(Collecting of scheduler job information is turned off)</div></div><div><br></div><div>And I'm using the Cluster GPU Quadruple Extra Large instances which I think has about 23G memory. The issue that I see is too many of the jobs are submitted. I guess I need to set the mem_free too? (the problem is the tool im using does not seem to have a way tot set that...)</div><div><br></div><div>Many thanks,</div><div>Amir</div><div><br><div><div><div>On Nov 8, 2011, at 5:47 AM, Justin Riley wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">

  
    <meta content="text/html; charset=ISO-8859-1" http-equiv="Content-Type">

  
  <div bgcolor="#FFFFFF" text="#000000">

    <br>

    -----BEGIN PGP SIGNED MESSAGE-----<br>

    Hash: SHA1<br>

    <br>

    Hi Amirhossein,<br>

    <br>

    Did you specify the memory usage in your job script or at command

    line and what parameters did you use exactly?<br>

    <br>

    Doing a quick search I believe that the following will solve the

    problem although I haven't tested myself:<br>

    <br>

    $ qsub -l mem_free=MEM_NEEDED,h_vmem=MEM_MAX yourjob.sh<br>

    <br>

    Here, MEM_NEEDED and MEM_MAX are the lower and upper bounds for your

    job's memory requirements.<br>

    <br>

    HTH,<br>

    <br>

    ~Justin<br>

    <br>

    On 7/22/64 2:59 PM, Amirhossein Kiani wrote:<br>

    <span style="white-space: pre;">&gt; Dear Star Cluster users,<br>

      &gt;<br>

      &gt; I'm using Star Cluster to set up an SGE and when I ran my job

      list, although I had specified the memory usage for each job, it

      submitted too many jobs on my instance and my instance started

      going out of memory and swapping.<br>

      &gt; I wonder if anyone knows how I could tell the SGE the max

      memory to consider when submitting jobs to each node so that it

      doesn't run the jobs if there is not enough memory available on a

      node.<br>

      &gt;<br>

      &gt; I'm using the Cluster GPU Quadruple Extra Large instances.<br>

      &gt;<br>

      &gt; Many thanks,<br>

      &gt; Amirhossein Kiani</span><br>

    <br>

    -----BEGIN PGP SIGNATURE-----<br>

    Version: GnuPG v1.4.11 (Darwin)<br>

    Comment: Using GnuPG with Mozilla - <a class="moz-txt-link-freetext" href="http://enigmail.mozdev.org/">http://enigmail.mozdev.org/</a><br>

    <br>

    iEYEARECAAYFAk65MvgACgkQ4llAkMfDcrl4TACeNxwd6SWRNeEc14NE0MbXn+4M<br>

    r6gAoJL+MWdLet1LILxfaesTGhXfVyNs<br>

    =dcOo<br>

    -----END PGP SIGNATURE-----<br>

    <br>

  </div>


</blockquote></div><br></div></div></div></blockquote></div><br></div></div></div></body></html>