Justin, I'd be happy to put one together (right now I have it embedded as part of a larger plugin that does other things). <div><br></div><div><br><div class="gmail_quote">On Thu, Apr 5, 2012 at 10:40 AM, Justin Riley <span dir="ltr"><<a href="mailto:jtriley@mit.edu">jtriley@mit.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Dan,<br>
<br>
Awesome, thanks for sharing these details. Do you by chance have a<br>
mongodb plugin for StarCluster that does all of this or would you be<br>
willing to put one together? I'd like to include a mongodb plugin in the<br>
next feature release...<br>
<br>
~Justin<br>
<div><div class="h5"><br>
On Wed, Apr 04, 2012 at 08:57:01AM -0400, Dan Yamins wrote:<br>
> 1) What is the format of your data? And how big are the entries? I<br>
> like MongoDB for this sort of thing, but it may depend on what kind thing<br>
> you want to store.<br>
> 2) ssh tunnels maybe a good solution for having a common DB backing the<br>
> cluster. Basically, if you use a DB that is accessible as a service on a<br>
> port, then if you ssh tunnel from the various worker nodes to the node<br>
> running the DB, software running on the worker nodes can act "as if" the<br>
> database were purely local.<br>
> In other words, do three things<br>
> A) set up a single DB actually running on one designated node, one<br>
> some port. e.g. port 27017 on master.<br>
> B) write code in your worker that pretends the DB is local on the<br>
> port (here's pythonesque code for mongoDB):<br>
> connection = pymongo.connection(host='localhost', port=27017)<br>
> collection = conn['my_database']['my_collection']<br>
> collection.insert(my_record)<br>
> <etc.....><br>
> C) and then separately establish an ssh tunnel from the worker node<br>
> to the master (or wherever the single DB is running). This can be done<br>
> in a starcluster plugin in the "add_node" or "run" methods like this:<br>
> workernode.ssh.execute("ssh -f -N -L 27017t:localhost:27017<br>
> root@master")<br>
> Of course you could start this by hand on all the nodes as well, but that<br>
> gets a little tedious, and the plugin system is perfect for this kind of<br>
> job.<br>
> Having done A), B), and C), when you run the code in B) on your worker<br>
> node, the code will simple read and write to the single master database<br>
> from A) without having to know anything about the fact that's running on a<br>
> cluster.<br>
</div></div><div class="im">> On Tue, Apr 3, 2012 at 11:22 PM, Chris Diehl <[1]<a href="mailto:cpdiehl@gmail.com">cpdiehl@gmail.com</a>> wrote:<br>
><br>
> Hello,<br>
> I would like to use StarCluster to do some web scrapping and I'd like to<br>
> store the collected data in a DB that is available to all of the cluster<br>
> nodes. Is there a way to have a common DB backing the entire cluster?<br>
> Any particular DBs that anyone has had success with?<br>
> Thanks for your assistance!<br>
> Chris<br>
> _______________________________________________<br>
> StarCluster mailing list<br>
</div>> [2]<a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a><br>
> [3]<a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>
><br>
> References<br>
><br>
> Visible links<br>
> 1. mailto:<a href="mailto:cpdiehl@gmail.com">cpdiehl@gmail.com</a><br>
> 2. mailto:<a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a><br>
> 3. <a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>
<div class="HOEnZb"><div class="h5"><br>
> _______________________________________________<br>
> StarCluster mailing list<br>
> <a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a><br>
> <a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>
<br>
</div></div></blockquote></div><br></div>