[StarCluster] One simple question
Justin Riley
jtriley at MIT.EDU
Sat Oct 23 12:37:28 EDT 2010
Alexey,
The Sun Grid Engine queueing system is useful when you have a lot of
tasks to execute and not just one at a time interactively. For example,
you might need to convert 300 videos from one format to another. You
could either
1. Write a script that gets the list of nodes from /etc/hosts and then
loops over the jobs and the nodes, ssh'ing commands to be executed on
each node. A big problem with this approach is that the task execution
and management all depends on this script executing successfully all the
way through. What happens if the script fails? You would then lose all
task accounting information. Also, what if you suddenly discover you
need to do another batch of 300 videos while the previous batch is still
processing? Are you going to re-execute your script and overload the
cluster? This would definitely slow down all of your jobs. How will you
write your script to avoid overloading the cluster in this situation
without losing the fact that you want to submit new jobs *now*?
OR
2. Skip needing to get the list of nodes and ssh'ing commands to them
and instead just write a loop that sends 300 jobs to the queuing system
using "qsub". The queuing system will then do the work to find an
available node, execute the job, and store it's accounting information
(status, start time, end time, which node executed the job, etc) . The
queuing system will also handle load balancing your tasks across the
cluster so that any one node doesn't get significantly overloaded
compared to the other nodes in the cluster. If you suddenly discover you
need 300 more videos processed you could simply "qsub" 300 more jobs.
These jobs will be 'queued-up' and executed when a node becomes
available. This approach reduces your concerns to just executing a task
on a node rather than managing multiple jobs and nodes.
Also it is true that you can create "as many clusters as you want" with
cloud computing. However, in many cases it could get *very* expensive
launching multiple clusters for every single task or set of tasks.
Whether it's more cost effective to launch multiple clusters or just
queue a ton of jobs on a single cluster depends highly on the sort of
tasks you're executing.
Of course, just because a queueing system is installed doesn't mean you
*have* to use it at all. You can of course run things however you want
on the cluster. Hopefully I've made it clear that there are significant
advantages to using a queuing system to execute jobs on a cluster rather
than a home-brewed script.
Hope that helps...
~Justin
On 10/22/10 5:02 PM, Alexey PETROV wrote:
> Ye, StartCluster is a great.
> But, what for do we need to use whatever "/queuing system"./
> Surely, in cloud computing, user can create as many clusters as he
> wants, each for his particular tasks.
> So, why?!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20101023/3fd9eeb2/attachment.htm
More information about the StarCluster
mailing list