MMDS 2006. Workshop on Algorithms for Modern Massive Data Sets

Stanford University and Yahoo! Research
June 21–24, 2006

MMDS 2010. Workshop on Algorithms for Modern Massive Data Sets, Stanford, CA, June 15–18, 2010.

Synopsis

The 2006 Workshop on Algorithms for Modern Massive Data Sets (MMDS 2006) addressed algorithmic, mathematical, and statistical challenges in modern large-scale data analysis. The goals of MMDS 2008 were to explore novel techniques for modeling and analyzing massive, high-dimensional, and nonlinearly-structured scientific and internet data sets, and to bring together computer scientists, statisticians, mathematicians, and data analysis practitioners to promote cross-fertilization of ideas.

The organizers thank all participants and speakers for their time and interest.

Schedule, abstracts of talks and posters, PDF file with everything, original conference web page and Flyer.

Reports about the event: SIAM News and NA Digest.

Slides

Wednesday, June 21, 2006. Theme: Linear Algebraic Basics

Time	Talk
10:00 -11:00	Tutorial: Ravi Kannan Sampling in large matrices
11:00 -11:30	Santosh Vempala Related paper: Matrix approximation and projective clustering via volume sampling
11:30 -12:00	Petros Drineas Subspace sampling and relative error matrix approximation
1:30 - 2:30	Tutorial: Dianne O'Leary Matrix factorizations for information retrieval
2:30 - 3:00	Pete Stewart Sparse reduced rank approximations to sparse matrices
3:00 - 3:30	Haesun Park Adaptive discriminant analysis by regularized minimum squared errors
4:00 - 4:30	Michael Mahoney CUR matrix decompositions for improved data analysis
4:30 - 5:00	Daniel Spielman Fast algorithms for graph partitioning, sparsifications, and solving SDD systems
5:00 - 5:30	Anna Gilbert/Martin Strauss List decoding of noisy Reed-Muller-like codes
5:30 - 6:00	Bob Plemmons Low-rank nonnegative factorizations for spectral imaging applications
6:00 - 6:30	Art Owen A hybrid of multivariate regression and factor analysis

Thursday, June 22, 2006. Theme: Industrial Applications and Sampling Methods

Time	Talk
9:00 -10:00	Tutorial: Prabhakar Raghavan The changing face of web search
10:00 -10:30	Tong Zhang Statistical ranking problem
11:00 -11:30	Michael Berry Text-mining approaches for email surveillance
11:30 -12:00	Hongyuan Zha Incorporating query difference for learning retrieval functions
12:00 -12:30	Trevor Hastie/Ping Li Efficient L2 and L1 dimension reduction in massive databases
2:00 - 3:00	Tutorial: Muthu Muthukrishnan An algorithmer's view of sparse approximation problems
3:00 - 3:30	Inderjit Dhillon Kernel learning with Bregman matrix divergences
3:30 - 4:00	Bruce Hendrickson Latent semantic analysis and Fiedler retrieval
4:30 - 5:00	Piotr Indyk Near optimal hashing algorithms for approximate near(est) neighbor problem
5:00 - 5:30	Moses Charikar Compact data representations and their applications
5:30 - 6:00	Sudipto Guha At the confluence of streams; order, information, and signals
6:00 - 6:30	Frank McSherry Preserving privacy in large-scale data analysis

Friday, June 23, 2006. Theme: Kernel and Learning Applications

Time	Talk
9:00 -10:00	Tutorial: Dimitris Achlioptas Applications of random matrices in spectral computations and machine learning
10:00 -10:30	Tomaso Poggio Learning: theory, engineering applications, and neuroscience
11:00 -11:30	Stephen Smale Related paper: Finding the homology of submanifolds with high confidence from random samples
11:30 -12:00	Gunnar Carlsson Algebraic topology and analysis of high dimensional data
12:00 -12:30	Vin de Silva Point-cloud topology via harmonic forms
2:00 - 2:30	Dan Boley Fast clustering leads to fast support vector machine training and more
2:30 - 3:00	Chris Ding On the equivalence of (semi-)nonnegative matrix factorization and k-means
3:00 - 3:30	Al Inselberg Parallel coordinates: visualization & data mining for high dimensiona datasets
3:30 - 4:00	Joel Tropp One sketch for all: a sublinear approximation scheme for heavy hitters
5:00 - 5:30	Rob Tibshirani Prediction by supervised principal components
5:30 - 6:00	Tao Yang/Apostolos Gerasoulis Page ranking for large-scale internet search: Ask.com's experiences

Saturday, June 24, 2006. Theme: Tensor-Based Data Applications

Time	Talk
10:00 -11:00	Tutorial: Lek-Heng Lim Tensors, symmetric tensors and nonnegative tensors in data analysis
11:00 -11:30	Eugene Tyrtyshnikov Tensor compression of petabyte-size data
11:30 -12:00	Lieven De Lathauwer The decomposition of a tensor as a sum of rank-(R1,R2,R3) terms
1:30 - 2:00	Orly Alter Matrix and tensor computations for reconstructing the pathways of a cellusr system from genome-scale signals
2:00 - 2:30	Shmuel Friedland Tensors: Ranks and approximations
2:30 - 3:00	Tammy Kolda Multilinear algebra for analyzing data with multiple linkages (for PowerPoint)
3:00 - 3:30	Lars Eldén Computing the best rank-(R1,R2,R3) approximation of a tensor
4:00 - 4:30	Liqun Qi Eigenvalues of tensors and their applications
4:30 - 5:00	Brett Bader Analysis of Latent Relationships in Semantic Graphs using DEDICOM
5:00 - 5:30	Alex Vasilescu Multilinear (tensor) algebraic framework for computer vision and graphics
5:30 - 6:00	Rasmus Bro Multi-way analysis of bioinformatic data (with movies)
6:00 - 6:30	Pierre Comon Independent component analysis viewed as a tensor decomposition

Organizers

Gene Golub, Stanford University

Michael Mahoney, Yahoo! Research

Petros Drineas, Rensselaer Polytechnic Institute

Lek-Heng Lim, Stanford University

Related Events

EMMDS 2009. European Workshop on Challenges in Modern Massive Data Sets, Technical University of Denmark, Lyngby, Denmark, July 1–4, 2009.

MMDS 2008. Workshop on Algorithms for Modern Massive Data Sets, Stanford, CA, June 25–28, 2008.

Acknowledgements

Jillian Anderson, Alex Brik, David Gleich, Lin Koh, Felix Kwok, Mirella Machuca, Wanjun Mi, Patty Namba