<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
</head>
<body dir="auto">
<div>I remember having to modify the s3fs source to handle files greater than ... I think 64GB, FYI, in case that's important. If anyone runs into issues with big files and s3fs, poke the code, or if you need further details (no idea if that's still a problem),
just let me know.</div>
<div><br>
</div>
<div>-Hugh</div>
<div><br>
On May 13, 2014, at 16:07, "Cedar McKay" <<a href="mailto:cmckay@uw.edu">cmckay@uw.edu</a>> wrote:<br>
<br>
</div>
<blockquote type="cite">
<div><base href="x-msg://48/">Great, thanks for all the info guys. I ended up implementing mounting my read only databases as an s3fs volume, then designating the ephemeral storage as the cache. Hopefully this will give me the best of all worlds; fast local
storage, and lazy downloading.
<div><br>
</div>
<div>I haven't tested much yet, but If I have problems with this setup I'll probably just skip the s3fs thing and just load the database straight onto the ephemeral storage as you suggested.</div>
<div><br>
</div>
<div>best,</div>
<div>Cedar</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
<div>
<div>On May 12, 2014, at 2:12 PM, Steve Darnell <<a href="mailto:darnells@dnastar.com">darnells@dnastar.com</a>> wrote:</div>
<br class="Apple-interchange-newline">
<blockquote type="cite">
<div lang="EN-US" link="blue" vlink="purple" style="font-family: Helvetica; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; ">
<div class="WordSection1" style="page: WordSection1; ">
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125); ">Hi Cedar,<o:p></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125); "> </span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125); ">I completely agree with David. We routinely use blast in a software pipeline build on top of EC2. We started by using an NFS share, but we are currently transitioning
to ephemeral storage.<o:p></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125); "> </span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125); ">Our plan is to put the nr database (and other file-based data libraries) on local SSD ephemeral storage for each node in the cluster. You may want to consider pre-packaging
the compressed libraries on a custom StarCluster AMI, then use a plug-in to mount ephemeral storage and decompress the blast libraries into ephemeral storage. The avoids the download from S3 each time you start a node, which added 10-20 minutes in our case.
Plus, it eliminates one more possible point of failure during cluster initialization. To us, it is worth the extra cost of maintaining a custom AMI and the extra size of the AMI itself.<o:p></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125); "> </span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125); ">Best regards,<o:p></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125); ">Steve<o:p></o:p></span></div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125); "> </span></div>
<div>
<div style="border-style: solid none none; border-top-width: 1pt; border-top-color: rgb(181, 196, 223); padding: 3pt 0in 0in; ">
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<b><span style="font-size: 10pt; font-family: Tahoma, sans-serif; ">From:</span></b><span style="font-size: 10pt; font-family: Tahoma, sans-serif; "><span class="Apple-converted-space"> </span><a href="mailto:starcluster-bounces@mit.edu">starcluster-bounces@mit.edu</a>
[mailto:starcluster-<a href="mailto:bounces@mit.edu">bounces@mit.edu</a>]<span class="Apple-converted-space"> </span><b>On Behalf Of<span class="Apple-converted-space"> </span></b>David Stuebe<br>
<b>Sent:</b><span class="Apple-converted-space"> </span>Friday, May 09, 2014 12:49 PM<br>
<b>To:</b><span class="Apple-converted-space"> </span><a href="mailto:starcluster@mit.edu">starcluster@mit.edu</a><br>
<b>Subject:</b><span class="Apple-converted-space"> </span>Re: [StarCluster] Fast shared or local storage? (Cedar McKay)<o:p></o:p></span></div>
</div>
</div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<o:p> </o:p></div>
<div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 10.5pt; font-family: Calibri, sans-serif; "> </span></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 10.5pt; font-family: Calibri, sans-serif; ">Hi Cedar<o:p></o:p></span></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 10.5pt; font-family: Calibri, sans-serif; "> </span></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 10.5pt; font-family: Calibri, sans-serif; ">Beware of using NFS – it may not be posix compliant in ways that seem minor but have caused problems for HDF5 files. I don't know what the blast db file structure is or how they organize their
writes, but it can be a problem in some circumstances. <o:p></o:p></span></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 10.5pt; font-family: Calibri, sans-serif; "> </span></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 10.5pt; font-family: Calibri, sans-serif; ">I really like the suggestions of using the ephemeral storage. I suggest you create a plugin that moves the data to the drive from S3 on startup when you add a node. That should be simpler than
the on demand caching which although elegant may take you some time to implement. <o:p></o:p></span></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 10.5pt; font-family: Calibri, sans-serif; "> </span></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 10.5pt; font-family: Calibri, sans-serif; ">David<o:p></o:p></span></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 10.5pt; font-family: Calibri, sans-serif; "> </span></div>
</div>
<blockquote id="MAC_OUTLOOK_ATTRIBUTION_BLOCKQUOTE" style="border-style: none none none solid; border-left-width: 4.5pt; border-left-color: rgb(181, 196, 223); padding: 0in 0in 0in 4pt; margin-left: 3.75pt; margin-right: 0in; ">
<div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 10.5pt; font-family: Calibri, sans-serif; "> </span></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 10.5pt; font-family: Calibri, sans-serif; ">Thanks for the very useful reply. I think I'm going to go with the s3fs option and cache to local ephemeral drives. A big blast database is split into many parts, and I'm pretty sure that every
file in a blast db isn't read every time, so this way blasting can proceed immediately. The parts of the blast database download from s3 on demand, and cached locally. If there was much writing, I'd probably be reluctant to use this approach because the s3
eventual consistency model seems to require tolerance of write fails at the application level. I'll write my results to a shared nfs volume.<o:p></o:p></span></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 10.5pt; font-family: Calibri, sans-serif; "> </span></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 10.5pt; font-family: Calibri, sans-serif; ">I thought about mpiBlast and will probably explore it, but I read some reports that it's xml output isn't exactly the same as the official NCBI blast output, and may break biopython parsing.
I haven't confirmed this, and will probably compare the two techniques.<o:p></o:p></span></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 10.5pt; font-family: Calibri, sans-serif; "> </span></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 10.5pt; font-family: Calibri, sans-serif; ">Thanks again!<o:p></o:p></span></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 10.5pt; font-family: Calibri, sans-serif; "> </span></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 10.5pt; font-family: Calibri, sans-serif; ">Cedar<o:p></o:p></span></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 10.5pt; font-family: Calibri, sans-serif; "> </span></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 10.5pt; font-family: Calibri, sans-serif; "> </span></div>
</div>
<div>
<div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif; ">
<span style="font-size: 10.5pt; font-family: Calibri, sans-serif; "> </span></div>
</div>
</blockquote>
</div>
_______________________________________________<br>
StarCluster mailing list<br>
<a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a><br>
<a href="http://mailman.mit.edu/mailman/listinfo/starcluster">http://mailman.mit.edu/mailman/listinfo/starcluster</a></div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<blockquote type="cite">
<div><span>_______________________________________________</span><br>
<span>StarCluster mailing list</span><br>
<span><a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a></span><br>
<span><a href="http://mailman.mit.edu/mailman/listinfo/starcluster">http://mailman.mit.edu/mailman/listinfo/starcluster</a></span><br>
</div>
</blockquote>
</body>
</html>