[Crib-list] TODAY: SPEAKERS: Jeremy Kepner, Vijay Gadepally, et al -- Computational Research in Boston and Beyond Seminar (CRIBB) -- Friday, Feb. 7, 2014 -- TIME: 12:00 Noon in Bldg. 32, Room 141 (Stata Center) (fwd)

Fri Feb 7 10:29:06 EST 2014

T O D A Y . . .

 		   COMPUTATIONAL RESEARCH in BOSTON and BEYOND SEMINAR

DATE:		FRIDAY, FEBRUARY 7, 2014
DATE:		12:00 Noon
LOCATION:	Building 32, Room 141   (Stata Center)

 	Pizza will be provided at 11:45 AM outside Room 32-141

TITLE:		Computing on Masked Big Data

SPEAKERS:	Jeremy Kepner, Vijay Gadepally, Pete Michaleas,
 		Nabil Schear, Mayank Varia   (MIT-Lincoln Laboratory)

ABSTRACT:

The growing gap between data and users calls for innovative tools that address 
the challenges faced by big data volume, velocity and variety. Along with these 
three Vs of big data, an increasingly important fourth challenge is veracity. 
Big data volume stresses the storage, memory, and compute capacity of a 
computing system and requires access to a computing cloud.  The velocity of big 
data stresses the rate at which data can be absorbed and meaningful answers 
produced.  Big data variety requires vast quantities of highly diverse data 
(text, computer logs, and social media data, etc.) to be automatically 
ingested. Traditional techniques for assuring the veracity of data incur 
overheads that are often too large to apply to big data, and there is 
increasing interest in investigating alternative techniques.  Computing on 
Masked Data (CMD) is one such low overhead technique that allows data to be 
masked, operated on, and then unmasked when the answers are desired.  CMD 
relies on the sparse linear algebra of associative arrays to transform 
computations from a space where + and * are the primary low-level operations to 
one where =, >, and < are the primary low-level operations.  Databases with 
strong support of sparse operations (such as SciDB or Apache Accumulo) are 
ideally suited to this technique.  A demonstration of the technique on DNA 
sequence data shows how DNA data can be masked, a complex DNA matching 
algorithm can be performed on the masked DNA data, and the result can be 
unmasked to reveal the true answer.  CMD can be performed with significantly 
less overhead than other approaches while also supporting a full range of 
linear algebraic operations on the masked data.

*********************************************************************************

Massachusetts Institute of Technology
Cambridge, MA

For more information, please visit...

 			http://math.mit.edu/crib/