[StarCluster] CPU load

Christopher Clearfield chris.clearfield at system-logic.com
Thu Jul 30 21:34:00 EDT 2015


There shouldn't be too much I/O be unless I'm missing something.

In iPython, I read the data from an HDF store on each node (once), then
instantiate a class on each node with the data:

%%px
store = pd.HDFStore(data_file, 'r') rows = store.select('results',
['cv_score_mean > 0']) rows = rows.sort('cv_score_mean', ascending=False)
rows['results_index'] = rows.index

# This doesn't take too long.
model_analytics = ResultsAnalytics(rows, store['data_model'])
---
## This dispatch takes between 1.5 min to 5 min
## 66K jobs
ar = lview.map(lambda x: model_analytics.generate_prediction_heuristic(x),
rows_index)
---
ar.wait_interactive(interval=1.0)

63999/66230 tasks finished after 2181 s
done

So the whole run takes awhile, though each job itself is relatively short.
But I don't understand why CPU isn't the limiting factor.

Rajat, thanks for recommending dstat.

Best,
Chris






On Thu, Jul 30, 2015 at 10:52 AM Jacob Barhak <jacob.barhak at gmail.com>
wrote:

> Hi Christopher,
>
> Do you have a lot of I/O? For example writing and reading many files to
> the same NFS location?
>
> This may explain things.
>
>           Jacob
> On Jul 30, 2015 2:34 AM, "Christopher Clearfield" <
> chris.clearfield at system-logic.com> wrote:
>
>> Hi All,
>> I'm running a set of about 60K relatively short jobs that take 30 minutes
>> to run. This is through ipython parallel.
>>
>> Yet my CPU utilization levels are relatively small:
>>
>> queuename qtype resv/used/tot. load_avg arch states
>> ---------------------------------------------------------------------------------
>> all.q at master BIP 0/0/2 0.98 linux-x64
>> ---------------------------------------------------------------------------------
>> all.q at node001 BIP 0/0/8 8.01 linux-x64
>> ---------------------------------------------------------------------------------
>> all.q at node002 BIP 0/0/8 8.07 linux-x64
>> ---------------------------------------------------------------------------------
>> all.q at node003 BIP 0/0/8 7.96 linux-x64
>>
>> (I disabled the ipython engines on master because I was having heartbeat
>> timeout issues with the worker engines on my nodes, which explains why that
>> is so low).
>>
>> But ~8% utilization on the nodes. Is that expected?
>>
>> Thanks,
>> Chris
>>
>>
>> _______________________________________________
>> StarCluster mailing list
>> StarCluster at mit.edu
>> http://mailman.mit.edu/mailman/listinfo/starcluster
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.mit.edu/pipermail/starcluster/attachments/20150731/3f0a817d/attachment.htm


More information about the StarCluster mailing list