This course examines the problems of multiple testing and statistical inference from a modern point of view. High-dimensional data is now common in many applications across the biological, physical, and social sciences. With this increased capacity to generate and analyze data, classical statistical methods may no longer ensure the reliability or replicability of scientific discoveries. We will examine a range of modern methods that provide statistical inference tools in the context of modern large-scale data analysis. The course will have weekly assignments as well as a final project, both of which will include both theoretical and computational components.

#### Prerequisites

Stat 24400 or equivalent. Undergraduates may enroll with permission of the instructor.

#### Course materials2016

Links to resources & references (updated periodically)

R code for gene expression data (day 1): COPD_statin_gene_expr.R

R code for Benjamini-Hochberg simulations with dependent p-values (week 2): BH_simulations.R

Matlab code for visualizing the Benjamini-Hochberg procedure with n=3 (week 2): BH_worst_case.m

Gene expression data / z-scores / two groups model (from week 3/4): R code COPD_statin_gene_expr_mixture_model.R

Online testing demo comparing various methods: R code online_testing_methods.R & online_testing_demo.R.

Regression tutorial - code: regression_tutorial.R

Debiasing for the lasso - simulation: debiasing.R

R code for gene expression data (day 1): COPD_statin_gene_expr.R

R code for Benjamini-Hochberg simulations with dependent p-values (week 2): BH_simulations.R

Matlab code for visualizing the Benjamini-Hochberg procedure with n=3 (week 2): BH_worst_case.m

Gene expression data / z-scores / two groups model (from week 3/4): R code COPD_statin_gene_expr_mixture_model.R

Online testing demo comparing various methods: R code online_testing_methods.R & online_testing_demo.R.

Regression tutorial - code: regression_tutorial.R

Debiasing for the lasso - simulation: debiasing.R

#### Assignments

Problem set 1: assignment ProbSet1.pdf ; code
COPD_statin_gene_expr_for_HW.R

P-hacking challenge: assignment p-hacking_challenge.pdf ; data set p-hacking_data_set.txt

Results: p-hacking_pvals.txt, p-hacking_responses.txt, p-hacking_plot_results.R

Problem set 2: assignment ProbSet2.pdf

Real data critique: assignment real_data_critique.pdf

Problem set 3: assignment ProbSet3.pdf

Problem set 4: assignment ProbSet4.pdf. Code: conditional_affine.R

P-hacking challenge: assignment p-hacking_challenge.pdf ; data set p-hacking_data_set.txt

Results: p-hacking_pvals.txt, p-hacking_responses.txt, p-hacking_plot_results.R

Problem set 2: assignment ProbSet2.pdf

Real data critique: assignment real_data_critique.pdf

Problem set 3: assignment ProbSet3.pdf

Problem set 4: assignment ProbSet4.pdf. Code: conditional_affine.R

#### Final Project

Final project topic suggestions:
final_project_ideas.pdf