[Crib-list] Speaker: NADYA BLISS -- Computational Research in Boston Seminar -- Friday, May 2, 2008 -- TIME: 12:30 PM -- Room 32-144 (fwd)

Thu May 1 14:20:15 EDT 2008

 		COMPUTATIONAL RESEARCH in BOSTON SEMINAR

DATE:		Friday, MAY 2, 2008
TIME:		12:30 PM
LOCATION:	Building 32, Room 144  (Stata Center)

(Pizza and beverages will be provided at 12:15 PM outside room.)

TITLE:		SMaRT: Sparse Mapping and Routing Toolbox

SPEAKER:	NADYA BLISS  (MIT - Lincoln Lab)

ABSTRACT:

Over the last few years, MIT Lincoln Laboratory has been developing techniques 
for automatic algorithm-to-architecture mapping of signal processing 
applications.  Due to the nature of signal processing operations, modeling, 
program analysis, and mapping techniques have been limited to dense matrix 
computations.  As more post-processing algorithms move to the sensor front end, 
efficient parallel implementations of those algorithms for sparse data are 
becoming necessary.  Post-processing algorithms are typically represented in 
graphical form, but can also be formulated using sparse matrix notation.  The 
efficiency of the graph algorithms is often only a small fraction (< 0.01) of 
the peak throughput of a conventional architecture, and the efficiency 
deteriorates as the problem size increases.  Furthermore, graph algorithms are 
not well suited for parallelization.  Sparse matrix formulations expose 
intrinsic algorithm parallelism; however, efficient mappings are still needed 
to exploit the parallelism and deliver desirable efficiency.

This talk will motivate the need for efficient sparse matrix processing with an 
example from the graph algorithm community and highlight a key computational 
kernel: sparse matrix-matrix multiply.  We will first introduce a high-level 
overview of our dense mapping framework and highlight its limitations with 
regards to parallelization of sparse computations.  We will then show how SMaRT 
overcomes these limitations using fine-grain architecture modeling, dependency 
analysis, mapping and routing techniques.  Finally, we will present results for 
a number of implementations of the matrix multiply kernel.  Our results show 
over an order of magnitude performance improvement over traditional sparse 2D 
block-cyclic mappings.  This performance improvement can be attributed to 
detailed runtime analysis of the computation, sensitivity to the sparsity of 
the data, co-optimized map and route determination, and the detailed hardware 
modeling capability.

This is joint work with Una-May O'Reilly at CSAIL and Sanjeev Mohindra at 
Lincoln Laboratory.

This work is sponsored by the Department of the Air Force under Air Force 
contract FA8721-05-C-0002. Opinions, interpretations, conclusions and 
recommendations are those of the author and are not necessarily endorsed by the 
United States Government

*****************************************************************************

Massachusetts Institute of Technology
Cambridge, MA  02139

http://www-math. mit.edu/crib

For information on CRiB, contact:

Alan Edelman:  edelman at math.mit.edu
Steven G. Johnson:  stevenj at math.mit.edu
Jeremy Kepner:  kepner at ll.mit.edu
Patrick Dreher:  dreher at mit.edu