[StarCluster] Cluster-wide DB access

Chris Diehl cpdiehl at gmail.com
Wed Apr 4 11:11:40 EDT 2012


Hi Dan,

Thanks for your reply! What you propose is exactly what I was looking for.
MongoDB or something similar would be perfect. I'll be ultimately storing
JSON records to the DB. I'll try giving this a go very soon.

Thanks again,

Chris

On Wed, Apr 4, 2012 at 5:57 AM, Dan Yamins <dyamins at gmail.com> wrote:

> 1) What is the format of your data?   And how big are the entries?   I
> like MongoDB for this sort of thing, but it may depend on what kind thing
> you want to store.
>
> 2) ssh tunnels maybe a good solution for having a common DB backing the
> cluster.   Basically, if you use a DB that is accessible as a service on a
> port, then if you ssh tunnel from the various worker nodes to the node
> running the DB, software running on the worker nodes can act "as if" the
> database were purely local.
>
>  In other words, do three things
>
>     A) set up a single DB actually running on one designated node, one
> some port. e.g. port 27017 on master.
>
>     B)  write code in your worker that pretends the DB is local on the
> port (here's pythonesque code for mongoDB):
>
>      connection = pymongo.connection(host='localhost', port=27017)
>      collection = conn['my_database']['my_collection']
>      collection.insert(my_record)
>      <etc.....>
>
>      C) and then separately  establish an ssh tunnel from the worker node
> to the master (or wherever the single DB is running).   This can be done in
> a starcluster plugin in the  "add_node" or "run" methods like this:
>
>           workernode.ssh.execute("ssh -f -N -L 27017t:localhost:27017
> root at master")
>
> Of course you could start this by hand on all the nodes as well, but that
> gets a little tedious, and the plugin system is perfect for this kind of
> job.
>
> Having done A), B), and C),  when you run the code in B) on your worker
> node, the code will simple read and write to the single master database
> from A) without having to know anything about the fact that's running on a
> cluster.
>
>
> On Tue, Apr 3, 2012 at 11:22 PM, Chris Diehl <cpdiehl at gmail.com> wrote:
>
>> Hello,
>>
>> I would like to use StarCluster to do some web scrapping and I'd like to
>> store the collected data in a DB that is available to all of the cluster
>> nodes. Is there a way to have a common DB backing the entire cluster? Any
>> particular DBs that anyone has had success with?
>>
>> Thanks for your assistance!
>>
>> Chris
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster at mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20120404/d462e87c/attachment.htm


More information about the StarCluster mailing list