Justin, I&#39;d be happy to put one together  (right now I have it embedded as part of a larger plugin that does other things).   <div><br></div><div><br><div class="gmail_quote">On Thu, Apr 5, 2012 at 10:40 AM, Justin Riley <span dir="ltr">&lt;<a href="mailto:jtriley@mit.edu">jtriley@mit.edu</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Dan,<br>

<br>

Awesome, thanks for sharing these details. Do you by chance have a<br>

mongodb plugin for StarCluster that does all of this or would you be<br>

willing to put one together? I&#39;d like to include a mongodb plugin in the<br>

next feature release...<br>

<br>

~Justin<br>

<div><div class="h5"><br>

On Wed, Apr 04, 2012 at 08:57:01AM -0400, Dan Yamins wrote:<br>

&gt;    1) What is the format of your data?   And how big are the entries?   I<br>

&gt;    like MongoDB for this sort of thing, but it may depend on what kind thing<br>

&gt;    you want to store.<br>

&gt;    2) ssh tunnels maybe a good solution for having a common DB backing the<br>

&gt;    cluster.   Basically, if you use a DB that is accessible as a service on a<br>

&gt;    port, then if you ssh tunnel from the various worker nodes to the node<br>

&gt;    running the DB, software running on the worker nodes can act &quot;as if&quot; the<br>

&gt;    database were purely local.<br>

&gt;     In other words, do three things<br>

&gt;        A) set up a single DB actually running on one designated node, one<br>

&gt;    some port. e.g. port 27017 on master.<br>

&gt;        B)  write code in your worker that pretends the DB is local on the<br>

&gt;    port (here&#39;s pythonesque code for mongoDB):<br>

&gt;         connection = pymongo.connection(host=&#39;localhost&#39;, port=27017)<br>

&gt;         collection = conn[&#39;my_database&#39;][&#39;my_collection&#39;]<br>

&gt;         collection.insert(my_record)<br>

&gt;         &lt;etc.....&gt;<br>

&gt;         C) and then separately  establish an ssh tunnel from the worker node<br>

&gt;    to the master (or wherever the single DB is running).   This can be done<br>

&gt;    in a starcluster plugin in the  &quot;add_node&quot; or &quot;run&quot; methods like this:<br>

&gt;              workernode.ssh.execute(&quot;ssh -f -N -L 27017t:localhost:27017<br>

&gt;    root@master&quot;)<br>

&gt;    Of course you could start this by hand on all the nodes as well, but that<br>

&gt;    gets a little tedious, and the plugin system is perfect for this kind of<br>

&gt;    job.<br>

&gt;    Having done A), B), and C),  when you run the code in B) on your worker<br>

&gt;    node, the code will simple read and write to the single master database<br>

&gt;    from A) without having to know anything about the fact that&#39;s running on a<br>

&gt;    cluster.<br>

</div></div><div class="im">&gt;    On Tue, Apr 3, 2012 at 11:22 PM, Chris Diehl &lt;[1]<a href="mailto:cpdiehl@gmail.com">cpdiehl@gmail.com</a>&gt; wrote:<br>

&gt;<br>

&gt;      Hello,<br>

&gt;      I would like to use StarCluster to do some web scrapping and I&#39;d like to<br>

&gt;      store the collected data in a DB that is available to all of the cluster<br>

&gt;      nodes. Is there a way to have a common DB backing the entire cluster?<br>

&gt;      Any particular DBs that anyone has had success with?<br>

&gt;      Thanks for your assistance!<br>

&gt;      Chris<br>

&gt;      _______________________________________________<br>

&gt;      StarCluster mailing list<br>

</div>&gt;      [2]<a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a><br>

&gt;      [3]<a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>

&gt;<br>

&gt; References<br>

&gt;<br>

&gt;    Visible links<br>

&gt;    1. mailto:<a href="mailto:cpdiehl@gmail.com">cpdiehl@gmail.com</a><br>

&gt;    2. mailto:<a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a><br>

&gt;    3. <a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>

<div class="HOEnZb"><div class="h5"><br>

&gt; _______________________________________________<br>

&gt; StarCluster mailing list<br>

&gt; <a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a><br>

&gt; <a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>

<br>

</div></div></blockquote></div><br></div>