[StarCluster] One simple question

Tue Oct 26 11:11:55 EDT 2010

On Tue, Oct 26, 2010 at 4:25 PM, Damian Eads <eads at soe.ucsc.edu> wrote:

> Hi Alexey,
>
> Thanks for your questions! :)
>
Hi Damian,

Thank you for sharing your experience and practice.
The problem is, that "cloud computing" is able to shifts our
usual perception of software and IT services.
For example, some "cloud" programmers have already started to think about
AMIs as "shared libraries".
Such revising of existing and commonplace viewpoints could revolutionary
change the software as well
(in the same way, as matrix algebra in its days helped to discover new
laws).
So, translating the existing software usage practice, without understanding
that rules of the game significantly shifted,
could be misleading. This primarily where my question come from.

>
> I guess I'll break my silence and mention that I'm an avid user of
> combining both Sun Grid Engine and MPI. I sometimes have several
> hundred MPI jobs I need to run where the returns diminish if the
> number of cores per MPI job is too high. Thus, I limit the number of
> cores/job so several MPI jobs run at once. As for creating "as many
> clusters as he wants", I've found it is often easier to manage a
> single cluster for a problem mainly because when I manage 2-4
> clusters, I often make mistakes in replicating volumes where my data
> and results are stored. At the very end of the computation, I run
> scripts which combine result files generated from all of the jobs. If
> they're on different volumes, I need to rsync each of them
> individually onto a common volume. By having all of my data on a
> single volume, I don't have to think about it. Only when I'm running
> second set of jobs for a completely different project with different
> code and data sets will I create second cluster.
>
Yes, just following you, I found "my own" benefits to use "queuing system".
If I would need to use CORBA scheduler to couple my MPI functionality,
I would need to run many MPI programs at the same time (better on the same
cluster, even from performance viewpoint).
So, the best solution to do this properly (I mean automatic load balancing),
is the "queuing system".
Therefore, even in case of user run one but complex task (coupled MPI
programs, for example) "queuing system" would really useful.

Thanks everybody, best regards,
Alexey

>
> Cheers,
>
> Damian
>
> On Sat, Oct 23, 2010 at 7:19 PM, Alexey PETROV
> <alexey.petrov.nnov at gmail.com> wrote:
> > Dear Justin,
> >
> > Thank you very much for your clear and full answer.
> > Yes, I completely agree with you that in case of low bound tasks and,
> > especially, if run them in routine everyday mode the "queuing system" is
> an
> > excellent solution. My initial harsh in this question was influenced by
> the
> > background where I came from, namely - MPI. I thought, that once user
> > has available "on demand" cluster computing nodes and MPI,
> > it eliminates the "queuing system" as a class from the "cloud computing".
> > Because MPI comes with its own task dispatcher and user can directly
> acquire
> > whatever powerful cluster configuration he need for his task, without
> > waiting for some proper resources will be available. Now, I see that
> there
> > are a lot of other applications that had better run in a cluster through
> > a pre-configured "queuing system", not by hand on a heap of nodes. Thank
> > you.
> > And, could I just confirm, once again - "If a single user need to run a
> MPI
> > task just from time to time (not on routine everyday basis), would he
> have
> > some additional benefits from "queuing system" in a cloud, or it better
> to
> > use MPI straightforward"?
> > Thank you in advance, sincerely yours,
> > Alexey
> > On Sat, Oct 23, 2010 at 6:37 PM, Justin Riley <jtriley at mit.edu> wrote:
> >>
> >> Alexey,
> >>
> >> The Sun Grid Engine queueing system is useful when you have a lot of
> tasks
> >> to execute and not just one at a time interactively. For example, you
> might
> >> need to convert 300 videos from one format to another. You could either
> >>
> >> 1. Write a script that gets the list of nodes from /etc/hosts and then
> >> loops over the jobs and the nodes, ssh'ing commands to be executed on
> each
> >> node. A big problem with this approach is that the task execution and
> >> management all depends on this script executing successfully all the way
> >> through. What happens if the script fails? You would then lose all task
> >> accounting information. Also, what if you suddenly discover you need to
> do
> >> another batch of 300 videos while the previous batch is still
> processing?
> >> Are you going to re-execute your script and overload the cluster? This
> would
> >> definitely slow down all of your jobs. How will you write your script to
> >> avoid overloading the cluster in this situation without losing the fact
> that
> >> you want to submit new jobs *now*?
> >>
> >> OR
> >>
> >> 2. Skip needing to get the list of nodes and ssh'ing commands to them
> and
> >> instead just write a loop that sends 300 jobs to the queuing system
> using
> >> "qsub". The queuing system will then do the work to find an available
> node,
> >> execute the job, and store it's accounting information (status, start
> time,
> >> end time, which node executed the job, etc) . The queuing system will
> also
> >> handle load balancing your tasks across the cluster so that any one node
> >> doesn't get significantly overloaded compared to the other nodes in the
> >> cluster. If you suddenly discover you need 300 more videos processed you
> >> could simply "qsub" 300 more jobs. These jobs will be 'queued-up' and
> >> executed when a node becomes available. This approach reduces your
> concerns
> >> to just executing a task on a node rather than managing multiple jobs
> and
> >> nodes.
> >>
> >> Also it is true that you can create "as many clusters as you want" with
> >> cloud computing. However, in many cases it could get *very* expensive
> >> launching multiple clusters for every single task or set of tasks.
> Whether
> >> it's more cost effective to launch multiple clusters or just queue a ton
> of
> >> jobs on a single cluster depends highly on the sort of tasks you're
> >> executing.
> >>
> >> Of course, just because a queueing system is installed doesn't mean you
> >> *have* to use it at all. You can of course run things however you want
> on
> >> the cluster. Hopefully I've made it clear that there are significant
> >> advantages to using a queuing system to execute jobs on a cluster rather
> >> than a home-brewed script.
> >>
> >> Hope that helps...
> >>
> >> ~Justin
> >>
> >> On 10/22/10 5:02 PM, Alexey PETROV wrote:
> >>
> >> Ye, StartCluster is a great.
> >> But, what for do we need to use whatever "queuing system".
> >> Surely, in cloud computing, user can create as many clusters as he
> wants,
> >> each for his particular tasks.
> >> So, why?!
> >>
> >> _______________________________________________
> >> StarCluster mailing list
> >> StarCluster at mit.edu
> >> http://mailman.mit.edu/mailman/listinfo/starcluster
> >>
> >
> >
> > _______________________________________________
> > StarCluster mailing list
> > StarCluster at mit.edu
> > http://mailman.mit.edu/mailman/listinfo/starcluster
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20101026/f109a542/attachment.htm