[StarCluster] Cluster-wide DB access

Justin Riley jtriley at MIT.EDU
Thu Apr 5 10:40:38 EDT 2012


Hi Dan,

Awesome, thanks for sharing these details. Do you by chance have a
mongodb plugin for StarCluster that does all of this or would you be
willing to put one together? I'd like to include a mongodb plugin in the
next feature release...

~Justin

On Wed, Apr 04, 2012 at 08:57:01AM -0400, Dan Yamins wrote:
>    1) What is the format of your data?   And how big are the entries?   I
>    like MongoDB for this sort of thing, but it may depend on what kind thing
>    you want to store.
>    2) ssh tunnels maybe a good solution for having a common DB backing the
>    cluster.   Basically, if you use a DB that is accessible as a service on a
>    port, then if you ssh tunnel from the various worker nodes to the node
>    running the DB, software running on the worker nodes can act "as if" the
>    database were purely local.
>     In other words, do three things
>        A) set up a single DB actually running on one designated node, one
>    some port. e.g. port 27017 on master.
>        B)  write code in your worker that pretends the DB is local on the
>    port (here's pythonesque code for mongoDB):
>         connection = pymongo.connection(host='localhost', port=27017)
>         collection = conn['my_database']['my_collection']
>         collection.insert(my_record)
>         <etc.....>
>         C) and then separately  establish an ssh tunnel from the worker node
>    to the master (or wherever the single DB is running).   This can be done
>    in a starcluster plugin in the  "add_node" or "run" methods like this:
>              workernode.ssh.execute("ssh -f -N -L 27017t:localhost:27017
>    root at master")
>    Of course you could start this by hand on all the nodes as well, but that
>    gets a little tedious, and the plugin system is perfect for this kind of
>    job.
>    Having done A), B), and C),  when you run the code in B) on your worker
>    node, the code will simple read and write to the single master database
>    from A) without having to know anything about the fact that's running on a
>    cluster.
>    On Tue, Apr 3, 2012 at 11:22 PM, Chris Diehl <[1]cpdiehl at gmail.com> wrote:
>
>      Hello,
>      I would like to use StarCluster to do some web scrapping and I'd like to
>      store the collected data in a DB that is available to all of the cluster
>      nodes. Is there a way to have a common DB backing the entire cluster?
>      Any particular DBs that anyone has had success with?
>      Thanks for your assistance!
>      Chris
>      _______________________________________________
>      StarCluster mailing list
>      [2]StarCluster at mit.edu
>      [3]http://mailman.mit.edu/mailman/listinfo/starcluster
>
> References
>
>    Visible links
>    1. mailto:cpdiehl at gmail.com
>    2. mailto:StarCluster at mit.edu
>    3. http://mailman.mit.edu/mailman/listinfo/starcluster

> _______________________________________________
> StarCluster mailing list
> StarCluster at mit.edu
> http://mailman.mit.edu/mailman/listinfo/starcluster

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
Url : http://mailman.mit.edu/pipermail/starcluster/attachments/20120405/7b881e72/attachment.bin


More information about the StarCluster mailing list