[Crib-list] SPEAKER: Constantinos Evangelinos (MIT) --- Computational Research in Boston Seminar --- Friday, February 5, 2010 --- 12:30 PM -- Room 32-124 (new location)

Shirley Entzminger daisymae at math.mit.edu
Fri Feb 5 09:46:40 EST 2010


IMPORTANT NOTE:  Due to a academic class, Schedules Office changed the 
room for the CRiB seminar this Spring 2010 -- new location is Room 32-124.

***********************************************************************

			COMPUTATIONAL RESEARCH in BOSTON SEMINAR


DATE:		Friday, February 5, 2010
TIME:		12:30 PM
LOCATION:	Building 32, Room 124  (new location)

Refreshments will be provided outside Room 32-124 at 12:15 PM.


TITLE:		Scientific Computing on the Cloud: Many Task Computing 
		and other opportunities


SPEAKER:	 Constantinos Evangelinos (MIT) 


Abstract:

Over the past few years the application of Cloud Computing to scientific 
and not simply business uses has been mainly in the areas of 
bioinformatics. The usual Cloud science application is one that is 
essentially embarrassingly parallel (parameter studies etc.) and in many 
cases expressed in the usual map-reduce paradigms that the Cloud has made 
so popular.

We set out to explore a wider class of applications that can benefit from 
the type of resources a commercial Cloud provider such as Amazon EC2 
offers, starting from the lower hanging fruit of loosely coupled 
applications.

Error Subspace Statistical Estimation (ESSE), an uncertainty prediction 
and data assimilation methodology employed for real-time ocean forecasts, 
is based on a characterization and prediction of the largest 
uncertainties. This is carried out by evolving an error subspace of 
variable size. We use an ensemble of stochastic model simulations, 
initialized based on an estimate of the dominant initial uncertainties, to 
predict the error subspace of the model fields. The ESSE procedure is a 
classic case of Many Task Computing: These codes are managed based on 
dynamic workflows for (i) the perturbation of the initial mean state, (ii) 
the subsequent ensemble of stochastic PE model runs, (iii) the continuous 
generation of the covariance matrix, (iv) the successive computations of 
the SVD of the ensemble spread until a convergence criterion is satisfied, 
and (v) the data assimilation. Its ensemble nature makes it a many task 
data intensive application and its dynamic workflow gives it 
heterogeneity. Subsequent acoustics propagation modeling involves a very 
large ensemble of very short in duration acoustics runs.

We study the execution characteristics and challenges of a distributed 
ESSE workflow on a large dedicated cluster and the usability of enhancing 
this with runs on Amazon EC2 and the Teragrid and the I/O challenges 
faced.

We then proceed to look into more closely coupled applications and the 
issues they face on Amazon.

************************************************************************

Massachusetts Institute of Technology           
Cambridge, MA 02139


More information about the CRiB-list mailing list