University of Chicago

Fall 2016

This course is about using matrix computations to infer useful information from observed data. One may view it as an "applied" version of Stat 309; the only prerequisite for this course is basic linear algebra. The data analytic tools that we will study will go beyond linear and multiple regression and often fall under the heading of "Multivariate Analysis" in Statistics or "Unsupervised Learning" in Machine Learning. These include factor analysis, correspondence analysis, principal components analysis, multidimensional scaling, canonical correlation analysis, Procrustes analysis, partial least squares, etc. We would also discuss a small number of supervised learning techniques including discriminant analysis and support vector machines. Understanding these techniques require some facility with matrices (primarily eigen and singular value decompositions, as well as their generalization) in addition to some basic statistics, both of which the student will acquire during the course.

- 11/28/16: Handout 9 and Slides 4 posted.

- 11/26/16: Final Exam will be held Mon, Dec 5, 6:30–9:30pm in Kent 120 (i.e., usual time and place).

- 11/22/16: TAs will hold office hours from 4:00–5:30pm, Wed, Nov 23, in Math-Stat Bldg, Room 302.

- 11/21/16: Handout 8 posted.

- 11/15/16: Homework 3 posted.

- 11/15/16: Handout 7 and Slides 3 posted.

- 11/14/16: TAs will hold extra office hours from 2:00–5:00pm today in Math-Stat Bldg.

- 11/10/16: No office hours on Mon, Nov 14; moved to Thu, Nov 17, 1:00–3:00pm.

- 11/07/16: Handout 6 posted.

- 11/01/16: Homework 2 posted.

- 10/31/16: Handout 5 and Slides 2 posted.

- 10/24/16: Handout 4 posted.

- 10/12/16: R tutorial in Kent 120 on Oct 17, 6:30–8:00pm.

- 10/10/16: Office hours on Thu, Oct 20, 1:00–3:00pm.

- 10/10/16: Handouts 2 and 3 posted.

- 10/10/16: Homework 1 posted.

- 10/07/16: No lecture and office hours on Mon, Oct 17.

- 10/01/16: Slides 1 and Handout 1 posted.

- 09/17/16: Check back regularly for announcements.

**Location:** Kent
Chem Lab, Room 120

**Times:** Mon, 6:30–9:30pm

**Instructor:** Lek-Heng
Lim

Office: Jones 122B

`lekheng(at)galton.uchicago.edu`

Tel: (773) 702-4263

Office hours: Mon, 2:00–4:00pm, Jones 122B

**Chicago Course Assistant I:** Klakow
Akepanidtaworn

`klakowa(at)uchicago.edu`

**Chicago Course Assistant II:** Triwit
Ariyathugun

`triwita1(at)uchicago.edu`

Office hours: Mon, 3:30–5:00pm in Math-Stat Library; Thu,
6:30–8:00pm in Room 302, Math-Stat Building (Stevanovich Center)

The last two applications fall under supervised learning but we will discuss them if time permits, if only to give an idea of how supervised learning differs from unsupervised learning.

**Tools:**- EVD = Eigenvalue decomposition
- SVD = Singular value decomposition
- GEVD = Generalized eigenvalue decomposition
- GSVD = Generalized singular value decomposition
**Applications:**- Principal component analysis (SVD)
- Factor analysis (EVD)
- Canonical correlation analysis (EVD)
- Correspondence analysis (GSVD)
- Hyperlink induced topic search (SVD)
- Latent semantic indexing (SVD)
- Procrustes analysis (SVD)
- Multidimensional scaling (EVD)
- Partial least squares (SVD)
- Linear discriminant analysis (GEVD)
- Support vector machines

Collaborations are permitted but you will need to write up your own solutions and declare your collaborators. The problem sets are designed to get progressively more difficult. You will get about 10 days for each problem set.

You are required to implement your own programs for problems that require some amount of simple coding (using Matlab, Mathematica, R, or SciPy).

- Problem Set 3 (posted: Nov 15, due: Nov 28)

- Problem Set 2 (posted: Nov 01, due: Nov 14)

- Problem Set 1 (posted: Oct 10, due: Oct 24)

**Bug report** on the problem sets:
`lekheng(at)galton.uchicago.edu`

**Grade composition:** 60% Problem Sets, 40% Final Exam (Mon, Dec
5, 6:30–9:30pm, Kent 120).

- Similar courses: Stat 32950. Multivariate Statistical Analysis, Busf 41912/Stat 32900. Applied Multivariate Analysis

You may download some of these books online from an UChicago IP address or via ProxyIt! if you are off-campus.

- G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning, Springer, 2013.

- R. Johnson, D. Wichern, Applied Multivariate Statistical Analysis, 6th Ed, Pearson, 2007.

- K. V. Mardia, J. T. Kent, J. M. Bibby, Multivariate Analysis, Academic Press, 1980.

- C. D. Meyer, Matrix Analysis and Applied Linear Algebra, SIAM, 2001.