<div dir="ltr">Hi Cedar,<div><br></div><div>I am relatively new to AWS, and StarCluster, but my suggestion would be to use either the C3 or R3 instances, launched in the same placement group, and just download the data from S3 to the cluster's master node.</div>
<div><br></div><div>1. I think your best bet might actually be to download the 150GB to an instance store and share it over NFS. In my experience, you can download from S3 at a couple hundred MB/s if the files are large enough, and if you open multiple connections. For example, one of my clusters downloads 7 different 2GB files in parallel, each at ~ 30MB/s (I don't think I am hitting the limits of the instance). I expect if your files are larger, you will have even better performance. At around 300MB/s, this would make downloading the whole database take about 8 minutes. If you use an SSD backed instance like the R3 types, the disk I/O will be free and fast.</div>
<div><br></div><div><br></div><div>2. With any of the cluster compute instances, you get <a href="http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html">enhanced networking</a> performance if you launch within the same placement group. I expect the 10G ethernet with enhanced networking will provide sufficient i/o performance. If you are still limited by I/O on NFS, you can also cache files on the worker's instance storage, if they access files multiple times.</div>
<div><br></div><div><br></div><div>Just my two cents, anyone feel free to correct me.</div><div><br></div><div>Good luck, and I'd be interested to hear about what you come up with!</div><div>Cory</div><div><br></div></div>
<div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, May 9, 2014 at 1:08 PM, Cedar McKay <span dir="ltr"><<a href="mailto:cmckay@uw.edu" target="_blank">cmckay@uw.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word">Thanks for the very useful reply. I think I'm going to go with the s3fs option and cache to local ephemeral drives. A big blast database is split into many parts, and I'm pretty sure that every file in a blast db isn't read every time, so this way blasting can proceed immediately. The parts of the blast database download from s3 on demand, and cached locally. If there was much writing, I'd probably be reluctant to use this approach because the s3 eventual consistency model seems to require tolerance of write fails at the application level. I'll write my results to a shared nfs volume.<div>
<br></div><div><div>I thought about mpiBlast and will probably explore it, but I read some reports that it's xml output isn't exactly the same as the official NCBI blast output, and may break biopython parsing. I haven't confirmed this, and will probably compare the two techniques. </div>
<div><br></div><div>Thanks again!</div><span class="HOEnZb"><font color="#888888"><div><br></div><div>Cedar</div></font></span><div><div class="h5"><div><br><div><br></div><div><br><div><div>On May 8, 2014, at 10:56 PM, Rayson Ho <<a href="mailto:raysonlogin@gmail.com" target="_blank">raysonlogin@gmail.com</a>> wrote:</div>
<br><blockquote type="cite"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, May 8, 2014 at 4:46 PM, Cedar McKay <span dir="ltr"><<a href="mailto:cmckay@uw.edu" target="_blank">cmckay@uw.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div style="word-wrap:break-word"><div>My use-case is to blast (briefly: blast is an alignment search tool) against a large ~150GB read-only reference database. I'm struggling to figure out how to give each of my nodes access to this database while maximizing performance. The shared volume need only be read-only, but write would be nice too. </div>
</div></blockquote><div><br></div><div>Chris from the bioteam (<a href="http://bioteam.net/" target="_blank">http://bioteam.net/</a> ) knows much more about Blast , but I will try to answer the questions from an AWS developer point of view.<br>
<br>(BTW, in case you didn't know... in some cases mpiBlast can give you <strong>super-linear speedup</strong> when the input DB is larger than the main memory of each node.)<br></div><div><br><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div style="word-wrap:break-word"><div>Does anyone have advice about the best approach?</div><div>My ideas:</div><div><ul><li>After starting cluster, copy database to ephemeral storage of each node?</li></ul></div></div>
</blockquote>
<div>If you put some simple logic in the SGE job script to pull the DB from S3, and store it locally the first time blastall runs on the node (ie. subsequent jobs read from the local copy), then this would give you the best performance and the lowest cost.<br>
<br>* Note that SGE can schedule multiple jobs onto the same node, so you will need some logic to make sure that only 1 transfer is done.<br><br>* Most (but not all) instance types give you over 150GB of ephemeral storage that you can read/write without additional cost!<br>
</div><div><br></div><div>* Note that intra-region S3 to EC2 data transfer is free, but the speed was below 80 MB/s last time we benchmarked it (even with instances that have 1GbE), so the overhead for the initial transfer will be around 30 mins.<br>
</div><div> <br></div><div>* IMO, this is the easiest as you don't need to set anything else up and all you need is a few lines of shell scripting.<br><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div style="word-wrap:break-word"><ul><li>Create separate EBS volumes for each node starting from a snapshot containing my reference database. But I don't see a way to automate this.</li></ul></div></blockquote>
<div>Keep in mind that if you need to read 150GB each time a Blast job runs, then it would cost you $0.49 for EBS I/O operations alone. Since main memory can't cache that much data, then you will need to re-read the data from EBS again.<br>
</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word"><ul><li>glusterfs. I saw reference to a glusterfs starcluster plugin a while back, but it doesn't seem to be in the current list of plugins.</li>
</ul></div></blockquote><div>IMO, it's too much work if all you need is to read input data.<br></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div style="word-wrap:break-word"><ul><li>s3fs. But is random access within a file poor? Even with caching turned on?</li></ul></div></blockquote><div>May be the 2nd best option as I assume you will have lots of queued jobs, and the 150GB of input data is read once from S3, and then will be accessed many times locally.<br>
<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word"><ul><li>Stick with default approach (nfs share a volume), but provision the headnode for faster networking? Provisioned IOPS EBS volumes? Any other simple optimizations?</li>
</ul></div></blockquote><div>If you just have a few execution (slave) nodes, it would work too. Just create a PIOPS EBS volume, and then mount it & NFS share it by specifying the values in the StarCluster config file:<br>
<br><a href="http://star.mit.edu/cluster/docs/latest/manual/configuration.html#amazon-ebs-volumes" target="_blank">http://star.mit.edu/cluster/docs/latest/manual/configuration.html#amazon-ebs-volumes</a><br></div><div> </div>
<div>For a larger number of nodes, the NFS server is still the bottleneck. S3 is much more scalable than a single NFS master. I would copy the DB from S3 to the local instance or use s3fs if I have say over 8 (YMMV) nodes in the cluster.<br>
<br>Rayson<br><br>==================================================<br>Open Grid Scheduler - The Official Open Source Grid Engine<br><a href="http://gridscheduler.sourceforge.net/" target="_blank">http://gridscheduler.sourceforge.net/</a><br>
<a href="http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html" target="_blank">http://gridscheduler.sourceforge.net/GridEngine/GridEngineCloud.html</a><br><br> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div style="word-wrap:break-word"><div><div><br></div><div><br></div><div><br></div><div>I really appreciate any help.</div></div><div>Thanks,</div><div>Cedar</div><div><br></div><div><br></div><div><br></div><div><br></div>
<div><br></div></div><br>_______________________________________________<br>
StarCluster mailing list<br>
<a href="mailto:StarCluster@mit.edu" target="_blank">StarCluster@mit.edu</a><br>
<a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>
<br></blockquote></div><br></div></div>
</blockquote></div><br></div></div></div></div></div></div><br>_______________________________________________<br>
StarCluster mailing list<br>
<a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a><br>
<a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>
<br></blockquote></div><br></div>