[StarCluster] Cluster-wide DB access

Dan Yamins dyamins at gmail.com
Thu Apr 5 12:30:56 EDT 2012


Justin, I'd be happy to put one together  (right now I have it embedded as
part of a larger plugin that does other things).


On Thu, Apr 5, 2012 at 10:40 AM, Justin Riley <jtriley at mit.edu> wrote:

> Hi Dan,
>
> Awesome, thanks for sharing these details. Do you by chance have a
> mongodb plugin for StarCluster that does all of this or would you be
> willing to put one together? I'd like to include a mongodb plugin in the
> next feature release...
>
> ~Justin
>
> On Wed, Apr 04, 2012 at 08:57:01AM -0400, Dan Yamins wrote:
> >    1) What is the format of your data?   And how big are the entries?   I
> >    like MongoDB for this sort of thing, but it may depend on what kind
> thing
> >    you want to store.
> >    2) ssh tunnels maybe a good solution for having a common DB backing
> the
> >    cluster.   Basically, if you use a DB that is accessible as a service
> on a
> >    port, then if you ssh tunnel from the various worker nodes to the node
> >    running the DB, software running on the worker nodes can act "as if"
> the
> >    database were purely local.
> >     In other words, do three things
> >        A) set up a single DB actually running on one designated node, one
> >    some port. e.g. port 27017 on master.
> >        B)  write code in your worker that pretends the DB is local on the
> >    port (here's pythonesque code for mongoDB):
> >         connection = pymongo.connection(host='localhost', port=27017)
> >         collection = conn['my_database']['my_collection']
> >         collection.insert(my_record)
> >         <etc.....>
> >         C) and then separately  establish an ssh tunnel from the worker
> node
> >    to the master (or wherever the single DB is running).   This can be
> done
> >    in a starcluster plugin in the  "add_node" or "run" methods like this:
> >              workernode.ssh.execute("ssh -f -N -L 27017t:localhost:27017
> >    root at master")
> >    Of course you could start this by hand on all the nodes as well, but
> that
> >    gets a little tedious, and the plugin system is perfect for this kind
> of
> >    job.
> >    Having done A), B), and C),  when you run the code in B) on your
> worker
> >    node, the code will simple read and write to the single master
> database
> >    from A) without having to know anything about the fact that's running
> on a
> >    cluster.
> >    On Tue, Apr 3, 2012 at 11:22 PM, Chris Diehl <[1]cpdiehl at gmail.com>
> wrote:
> >
> >      Hello,
> >      I would like to use StarCluster to do some web scrapping and I'd
> like to
> >      store the collected data in a DB that is available to all of the
> cluster
> >      nodes. Is there a way to have a common DB backing the entire
> cluster?
> >      Any particular DBs that anyone has had success with?
> >      Thanks for your assistance!
> >      Chris
> >      _______________________________________________
> >      StarCluster mailing list
> >      [2]StarCluster at mit.edu
> >      [3]http://mailman.mit.edu/mailman/listinfo/starcluster
> >
> > References
> >
> >    Visible links
> >    1. mailto:cpdiehl at gmail.com
> >    2. mailto:StarCluster at mit.edu
> >    3. http://mailman.mit.edu/mailman/listinfo/starcluster
>
> > _______________________________________________
> > StarCluster mailing list
> > StarCluster at mit.edu
> > http://mailman.mit.edu/mailman/listinfo/starcluster
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20120405/be680644/attachment.htm


More information about the StarCluster mailing list