MMDS 2006. Workshop on Algorithms for Modern Massive Data Sets

Stanford University and Yahoo! Research
June 21–24, 2006

MMDS 2010. Workshop on Algorithms for Modern Massive Data Sets, Stanford, CA, June 15–18, 2010.


The 2006 Workshop on Algorithms for Modern Massive Data Sets (MMDS 2006) addressed algorithmic, mathematical, and statistical challenges in modern large-scale data analysis. The goals of MMDS 2008 were to explore novel techniques for modeling and analyzing massive, high-dimensional, and nonlinearly-structured scientific and internet data sets, and to bring together computer scientists, statisticians, mathematicians, and data analysis practitioners to promote cross-fertilization of ideas.


Wednesday, June 21, 2006. Theme: Linear Algebraic Basics

Time Talk
10:00 -11:00 Tutorial: Ravi Kannan
Sampling in large matrices
11:00 -11:30 Santosh Vempala
Related paper: Matrix approximation and projective clustering via volume sampling
11:30 -12:00 Petros Drineas
Subspace sampling and relative error matrix approximation
1:30 - 2:30 Tutorial: Dianne O'Leary
Matrix factorizations for information retrieval
2:30 - 3:00 Pete Stewart
Sparse reduced rank approximations to sparse matrices
3:00 - 3:30 Haesun Park
Adaptive discriminant analysis by regularized minimum squared errors
4:00 - 4:30 Michael Mahoney
CUR matrix decompositions for improved data analysis
4:30 - 5:00 Daniel Spielman
Fast algorithms for graph partitioning, sparsifications, and solving SDD systems
5:00 - 5:30 Anna Gilbert/Martin Strauss
List decoding of noisy Reed-Muller-like codes
5:30 - 6:00 Bob Plemmons
Low-rank nonnegative factorizations for spectral imaging applications
6:00 - 6:30 Art Owen
A hybrid of multivariate regression and factor analysis

Thursday, June 22, 2006. Theme: Industrial Applications and Sampling Methods

Time Talk
9:00 -10:00 Tutorial: Prabhakar Raghavan
The changing face of web search
10:00 -10:30 Tong Zhang
Statistical ranking problem
11:00 -11:30 Michael Berry
Text-mining approaches for email surveillance
11:30 -12:00 Hongyuan Zha
Incorporating query difference for learning retrieval functions
12:00 -12:30 Trevor Hastie/Ping Li
Efficient L2 and L1 dimension reduction in massive databases
2:00 - 3:00 Tutorial: Muthu Muthukrishnan
An algorithmer's view of sparse approximation problems
3:00 - 3:30 Inderjit Dhillon
Kernel learning with Bregman matrix divergences
3:30 - 4:00 Bruce Hendrickson
Latent semantic analysis and Fiedler retrieval
4:30 - 5:00 Piotr Indyk
Near optimal hashing algorithms for approximate near(est) neighbor problem
5:00 - 5:30 Moses Charikar
Compact data representations and their applications
5:30 - 6:00 Sudipto Guha
At the confluence of streams; order, information, and signals
6:00 - 6:30 Frank McSherry
Preserving privacy in large-scale data analysis

Friday, June 23, 2006. Theme: Kernel and Learning Applications

Time Talk
9:00 -10:00 Tutorial: Dimitris Achlioptas
Applications of random matrices in spectral computations and machine learning
10:00 -10:30 Tomaso Poggio
Learning: theory, engineering applications, and neuroscience
11:00 -11:30 Stephen Smale
Related paper: Finding the homology of submanifolds with high confidence from random samples
11:30 -12:00 Gunnar Carlsson
Algebraic topology and analysis of high dimensional data
12:00 -12:30 Vin de Silva
Point-cloud topology via harmonic forms
2:00 - 2:30 Dan Boley
Fast clustering leads to fast support vector machine training and more
2:30 - 3:00 Chris Ding
On the equivalence of (semi-)nonnegative matrix factorization and k-means
3:00 - 3:30 Al Inselberg
Parallel coordinates: visualization & data mining for high dimensiona datasets
3:30 - 4:00 Joel Tropp
One sketch for all: a sublinear approximation scheme for heavy hitters
5:00 - 5:30 Rob Tibshirani
Prediction by supervised principal components
5:30 - 6:00 Tao Yang/Apostolos Gerasoulis
Page ranking for large-scale internet search:'s experiences

Saturday, June 24, 2006. Theme: Tensor-Based Data Applications

Time Talk
10:00 -11:00 Tutorial: Lek-Heng Lim
Tensors, symmetric tensors and nonnegative tensors in data analysis
11:00 -11:30 Eugene Tyrtyshnikov
Tensor compression of petabyte-size data
11:30 -12:00 Lieven De Lathauwer
The decomposition of a tensor as a sum of rank-(R1,R2,R3) terms
1:30 - 2:00 Orly Alter
Matrix and tensor computations for reconstructing the pathways of a cellusr system from genome-scale signals
2:00 - 2:30 Shmuel Friedland
Tensors: Ranks and approximations
2:30 - 3:00 Tammy Kolda
Multilinear algebra for analyzing data with multiple linkages (for PowerPoint)
3:00 - 3:30 Lars Eldén
Computing the best rank-(R1,R2,R3) approximation of a tensor
4:00 - 4:30 Liqun Qi
Eigenvalues of tensors and their applications
4:30 - 5:00 Brett Bader
Analysis of Latent Relationships in Semantic Graphs using DEDICOM
5:00 - 5:30 Alex Vasilescu
Multilinear (tensor) algebraic framework for computer vision and graphics
5:30 - 6:00 Rasmus Bro
Multi-way analysis of bioinformatic data (with movies)
6:00 - 6:30 Pierre Comon
Independent component analysis viewed as a tensor decomposition


Gene Golub, Stanford University

Michael Mahoney, Yahoo! Research

Petros Drineas, Rensselaer Polytechnic Institute

Lek-Heng Lim, Stanford University

Related Events

EMMDS 2009. European Workshop on Challenges in Modern Massive Data Sets, Technical University of Denmark, Lyngby, Denmark, July 1–4, 2009.

MMDS 2008. Workshop on Algorithms for Modern Massive Data Sets, Stanford, CA, June 25–28, 2008.

Sponsored by

National Science Foundation Forum Logo Yahoo! Research


Jillian Anderson, Alex Brik, David Gleich, Lin Koh, Felix Kwok, Mirella Machuca, Wanjun Mi, Patty Namba