<div class="gmail_quote">On Tue, Oct 26, 2010 at 4:25 PM, Damian Eads <span dir="ltr"><<a href="mailto:eads@soe.ucsc.edu">eads@soe.ucsc.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Hi Alexey,<br>
<br>
Thanks for your questions! :)<br></blockquote><div>Hi Damian,</div><div><br></div><div>Thank you for sharing your experience and practice.</div><div>The problem is, that "cloud computing" is able to shifts our usual perception of software and IT services.</div>
<div>For example, some "cloud" programmers have already started to think about AMIs as "shared libraries".</div><div>Such revising of existing and commonplace viewpoints could revolutionary change the software as well</div>
<div>(in the same way, as matrix algebra in its days helped to discover new laws).</div><div>So, translating the existing software usage practice, without understanding that rules of the game significantly shifted, </div>
<div>could be misleading. This primarily where my question come from.</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
I guess I'll break my silence and mention that I'm an avid user of<br>
combining both Sun Grid Engine and MPI. I sometimes have several<br>
hundred MPI jobs I need to run where the returns diminish if the<br>
number of cores per MPI job is too high. Thus, I limit the number of<br>
cores/job so several MPI jobs run at once. As for creating "as many<br>
clusters as he wants", I've found it is often easier to manage a<br>
single cluster for a problem mainly because when I manage 2-4<br>
clusters, I often make mistakes in replicating volumes where my data<br>
and results are stored. At the very end of the computation, I run<br>
scripts which combine result files generated from all of the jobs. If<br>
they're on different volumes, I need to rsync each of them<br>
individually onto a common volume. By having all of my data on a<br>
single volume, I don't have to think about it. Only when I'm running<br>
second set of jobs for a completely different project with different<br>
code and data sets will I create second cluster.<br></blockquote><div>Yes, just following you, I found "my own" benefits to use "queuing system".</div><div>If I would need to use CORBA scheduler to couple my MPI functionality, </div>
<div>I would need to run many MPI programs at the same time (better on the same cluster, even from performance viewpoint).</div><div>So, the best solution to do this properly (I mean automatic load balancing), is the "queuing system".</div>
<div>Therefore, even in case of user run one but complex task (coupled MPI programs, for example) "queuing system" would really useful.</div><div><br></div><div>Thanks everybody, best regards,</div><div>Alexey</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
Cheers,<br>
<font color="#888888"><br>
Damian<br>
</font><div><div></div><div class="h5"><br>
On Sat, Oct 23, 2010 at 7:19 PM, Alexey PETROV<br>
<<a href="mailto:alexey.petrov.nnov@gmail.com">alexey.petrov.nnov@gmail.com</a>> wrote:<br>
> Dear Justin,<br>
><br>
> Thank you very much for your clear and full answer.<br>
> Yes, I completely agree with you that in case of low bound tasks and,<br>
> especially, if run them in routine everyday mode the "queuing system" is an<br>
> excellent solution. My initial harsh in this question was influenced by the<br>
> background where I came from, namely - MPI. I thought, that once user<br>
> has available "on demand" cluster computing nodes and MPI,<br>
> it eliminates the "queuing system" as a class from the "cloud computing".<br>
> Because MPI comes with its own task dispatcher and user can directly acquire<br>
> whatever powerful cluster configuration he need for his task, without<br>
> waiting for some proper resources will be available. Now, I see that there<br>
> are a lot of other applications that had better run in a cluster through<br>
> a pre-configured "queuing system", not by hand on a heap of nodes. Thank<br>
> you.<br>
> And, could I just confirm, once again - "If a single user need to run a MPI<br>
> task just from time to time (not on routine everyday basis), would he have<br>
> some additional benefits from "queuing system" in a cloud, or it better to<br>
> use MPI straightforward"?<br>
> Thank you in advance, sincerely yours,<br>
> Alexey<br>
> On Sat, Oct 23, 2010 at 6:37 PM, Justin Riley <<a href="mailto:jtriley@mit.edu">jtriley@mit.edu</a>> wrote:<br>
>><br>
>> Alexey,<br>
>><br>
>> The Sun Grid Engine queueing system is useful when you have a lot of tasks<br>
>> to execute and not just one at a time interactively. For example, you might<br>
>> need to convert 300 videos from one format to another. You could either<br>
>><br>
>> 1. Write a script that gets the list of nodes from /etc/hosts and then<br>
>> loops over the jobs and the nodes, ssh'ing commands to be executed on each<br>
>> node. A big problem with this approach is that the task execution and<br>
>> management all depends on this script executing successfully all the way<br>
>> through. What happens if the script fails? You would then lose all task<br>
>> accounting information. Also, what if you suddenly discover you need to do<br>
>> another batch of 300 videos while the previous batch is still processing?<br>
>> Are you going to re-execute your script and overload the cluster? This would<br>
>> definitely slow down all of your jobs. How will you write your script to<br>
>> avoid overloading the cluster in this situation without losing the fact that<br>
>> you want to submit new jobs *now*?<br>
>><br>
>> OR<br>
>><br>
>> 2. Skip needing to get the list of nodes and ssh'ing commands to them and<br>
>> instead just write a loop that sends 300 jobs to the queuing system using<br>
>> "qsub". The queuing system will then do the work to find an available node,<br>
>> execute the job, and store it's accounting information (status, start time,<br>
>> end time, which node executed the job, etc) . The queuing system will also<br>
>> handle load balancing your tasks across the cluster so that any one node<br>
>> doesn't get significantly overloaded compared to the other nodes in the<br>
>> cluster. If you suddenly discover you need 300 more videos processed you<br>
>> could simply "qsub" 300 more jobs. These jobs will be 'queued-up' and<br>
>> executed when a node becomes available. This approach reduces your concerns<br>
>> to just executing a task on a node rather than managing multiple jobs and<br>
>> nodes.<br>
>><br>
>> Also it is true that you can create "as many clusters as you want" with<br>
>> cloud computing. However, in many cases it could get *very* expensive<br>
>> launching multiple clusters for every single task or set of tasks. Whether<br>
>> it's more cost effective to launch multiple clusters or just queue a ton of<br>
>> jobs on a single cluster depends highly on the sort of tasks you're<br>
>> executing.<br>
>><br>
>> Of course, just because a queueing system is installed doesn't mean you<br>
>> *have* to use it at all. You can of course run things however you want on<br>
>> the cluster. Hopefully I've made it clear that there are significant<br>
>> advantages to using a queuing system to execute jobs on a cluster rather<br>
>> than a home-brewed script.<br>
>><br>
>> Hope that helps...<br>
>><br>
>> ~Justin<br>
>><br>
>> On 10/22/10 5:02 PM, Alexey PETROV wrote:<br>
>><br>
>> Ye, StartCluster is a great.<br>
>> But, what for do we need to use whatever "queuing system".<br>
>> Surely, in cloud computing, user can create as many clusters as he wants,<br>
>> each for his particular tasks.<br>
>> So, why?!<br>
>><br>
>> _______________________________________________<br>
>> StarCluster mailing list<br>
>> <a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a><br>
>> <a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>
>><br>
><br>
><br>
> _______________________________________________<br>
> StarCluster mailing list<br>
> <a href="mailto:StarCluster@mit.edu">StarCluster@mit.edu</a><br>
> <a href="http://mailman.mit.edu/mailman/listinfo/starcluster" target="_blank">http://mailman.mit.edu/mailman/listinfo/starcluster</a><br>
><br>
><br>
</div></div></blockquote></div><br>