Hi Dan, <div><br></div><div>Thanks for your reply! What you propose is exactly what I was looking for. MongoDB or something similar would be perfect. I&#39;ll be ultimately storing JSON records to the DB. I&#39;ll try giving this a go very soon. </div>


<div><br></div><div>Thanks again, </div><div><br></div><div>Chris<br><br><div class="gmail_quote">On Wed, Apr 4, 2012 at 5:57 AM, Dan Yamins <span dir="ltr">&lt;<a href="mailto:dyamins@gmail.com">dyamins@gmail.com</a>&gt;</span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">1) What is the format of your data?   And how big are the entries?   I like MongoDB for this sort of thing, but it may depend on what kind thing you want to store.  <div>


<br></div><div>2) ssh tunnels maybe a good solution for having a common DB backing the cluster.   Basically, if you use a DB that is accessible as a service on a port, then if you ssh tunnel from the various worker nodes to the node running the DB, software running on the worker nodes can act &quot;as if&quot; the database were purely local.   </div>


<div><br></div><div> In other words, do three things</div><div><br></div><div>    A) set up a single DB actually running on one designated node, one some port. e.g. port 27017 on master. </div><div><br></div><div>    B)  write code in your worker that pretends the DB is local on the port (here&#39;s pythonesque code for mongoDB):</div>


<div><br></div><div>     connection = pymongo.connection(host=&#39;localhost&#39;, port=27017)</div><div>     collection = conn[&#39;my_database&#39;][&#39;my_collection&#39;]</div><div>     collection.insert(my_record)</div>


<div>     &lt;etc.....&gt;</div><div><br></div><div>     C) and then separately  establish an ssh tunnel from the worker node to the master (or wherever the single DB is running).   This can be done in a starcluster plugin in the  &quot;add_node&quot; or &quot;run&quot; methods like this:</div>


<div><br></div><div>          workernode.ssh.execute(&quot;ssh -f -N -L 27017t:localhost:27017 root@master&quot;)  </div><div><br></div><div>Of course you could start this by hand on all the nodes as well, but that gets a little tedious, and the plugin system is perfect for this kind of job. </div>


<div><br></div><div>Having done A), B), and C),  when you run the code in B) on your worker node, the code will simple read and write to the single master database from A) without having to know anything about the fact that&#39;s running on a cluster. </div>


<div><br></div><div><br></div><div><div><div class="gmail_quote"><div><div></div><div class="h5">On Tue, Apr 3, 2012 at 11:22 PM, Chris Diehl <span dir="ltr">&lt;<a href="mailto:cpdiehl@gmail.com" target="_blank">cpdiehl@gmail.com</a>&gt;</span> wrote:<br>


</div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div></div><div class="h5">

Hello, <div><br></div><div>I would like to use StarCluster to do some web scrapping and I&#39;d like to store the collected data in a DB that is available to all of the cluster nodes. Is there a way to have a common DB backing the entire cluster? Any particular DBs that anyone has had success with?</div>


<div><br></div><div>Thanks for your assistance!</div><span><font color="#888888"><div><br></div><div>Chris</div>

</font></span><br></div></div>_______________________________________________<br>

StarCluster mailing list<br>

<a href="mailto:StarCluster@mit.edu" target="_blank">StarCluster@mit.edu</a><br>

<a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>

<br></blockquote></div><br></div></div>

</blockquote></div><br></div>