SIAM '08 Minisymposium. Mathematical Methods in Data Mining

July 7–11, 2008
San Diego, CA

Synopsis

This minisymposium is about the recent developments of some novel mathematical techniques for data mining applications. These developments highlight the multidisciplinary nature of data analysis — the four talks in the minisymposium represent contributions from different communities, namely, Mathematics, Optimization, Statistics, Computer Science, as well as perspectives from industry. Our two main objectives are to publicize these methods and to provide a cross-fertilization platform for researchers from different communities.

The minisymposium will be held as a part of the 2008 SIAM Annual Meeting at the Town and Country Resort and Convention Center in San Diego, CA. Check out our minisymposium last year on Novel Matrix Methods for Internet Data Mining at the 6th International Congress on Industrial and Applied Mathematics (ICIAM 2007).

Organizers

Inderjit Dhillon, Lek-Heng Lim

Program

The official program is now online. Our minisymposium (MS 23) will take place on Monday, July 7, 4:00–6:00pm in the room Royal Palm 3.

4:00–4:25: Variable Selection: Tractable Upper Bounds on the Restricted Isometry Constant

Alexandre d'Aspremont, Princeton University

Slides

Abstract: We use recent semidefinite relaxation results to produce tractable upper bounds on sparse eigenvalues. We then test the performance of these bounds on various random matrices from compressed sensing, for which asymptotic estimates are known.

4:30–4:55: Chasing $1,000,000: How We Won the Netflix Progress Prize

Yehuda Koren, AT&T Labs Research

AMS news

Abstract: The collaborative filtering approach to recommender systems predicts user preferences for products or services by learning past user-item relationships. Their significant economic implications made collaborative filtering techniques play an important role at known e-tailers such as Amazon and Netflix. This field enjoyed a surge of interest since October 2006, when the Netflix Prize competition was commenced. Netflix released a dataset containing 100 million anonymous movie ratings and challenged the research community to develop algorithms that could beat the accuracy of its recommendation system, Cinematch. In this talk I will survey some of the principles and algoeirhms, which has led us to winning the 2007 Progress Prize in the Netflix competition.

5:00–5:25: Global Positioning from Local Distances

Amit Singer, Yale University

Slides

Abstract: In many applications, the main goal is to obtain a global low dimensional representation of the data, given some local noisy geometric constraints. In this talk we will show that problems in Cryo-Electron Microscopy for protein structuring, NMR spectroscopy for protein structuring, sensor networks and surface reconstruction can all be solved by constructing suitable operators on their data followed by the computation of a few eigenvectors of sparse matrices corresponding to the data operators.

5:30–5:55: Combinatorial Hodge Theory and a Geometric Approach to Ranking

Yuan Yao, Stanford University

Slides

Abstract: Modern ranking data are often incomplete, unbalanced, and arise from a complex network. We will propose a novel approach to analyze such data from the perspective of combinatorial geometry and topology, using combinatorial Hodge theory as our scalpel. In this framework, ranking data is represented as flows on a graph and Hodge-decomposed into three orthogonal components that are globally consistent, locally consistent, and locally inconsistent respectively. Rank aggregation then naturally emerges as projections onto a suitable subspace and an inconsistency measure of the ranking data also arises as curl distributions.

Contact

For further information on this event, please email Inderjit Dhillon at inderjit(at)cs.utexas.edu or Lek-Heng Lim at lekheng(at)math.berkeley.edu.