More of a discussion than a tutorial:
Feel free to ask questions at any time!
PLINK:
http://pngu.mgh.harvard.edu/~purcell/plink/
Paper series:
http://www.nature.com/nrg/series/gwas/index.html
\[ Y:= \left[ \begin{array}{c} y_{1} \\ y_{2} \\ \vdots \\ y_{n} \end{array} \right] ~~~ X:= \left[ \begin{array}{ccccc} x_{11} & \ldots & x_{1j} & \ldots & x_{1p}\\ x_{21} & \ldots & x_{2j} & \ldots & x_{2p}\\ \vdots & \vdots & \vdots & \ldots & \vdots \\ x_{n1} & \ldots & x_{nj} & \ldots & x_{np}\end{array} \right] \]
Single-SNP analysis:
correlate \(Y\) with each column of \(X\)
\[ Y:= \left[ \begin{array}{c} y_{1} \\ y_{2} \\ \vdots \\ y_{n} \end{array} \right] \sim \left[ \begin{array}{c} x_{1j} \\ x_{2j} \\ \vdots \\ x_{nj} \end{array} \right] := X_j \]
part of \(X\) missing?
\(X\leftarrow\)hidden \(Z\rightarrow Y\)?
look at \(X_1,\ldots,X_p\) jointly?
change definitions of \(X\) and \(Y\)?
\(X\) and \(Y\) not available?
extra info beyond \(X\) and \(Y\)?
Marchini and Howie (2010):
http://www.nature.com/nrg/journal/v11/n7/pdf/nrg2796.pdf
Guan and Stephens (2008):
http://dx.doi.org/10.1371/journal.pgen.1000279
General question: SNP quality control (QC)
QC protocol used by GIANT
http://www.genepi-regensburg.de/easyqc/
(Source: http://mga.bionet.nsc.ru/~yurii/courses/ge03-2012/confounding.pdf)
Recent review: Price et al (2010)
http://www.nature.com/nrg/journal/v11/n7/full/nrg2813.html
Genomic Control \(\lambda_{\sf GC}\): Devlin and Roeder (1999)
http://dx.doi.org/10.1111/j.0006-341X.1999.00997.x
STRuctured population Association Test: Pritchard et al (2000)
http://dx.doi.org/10.1086/302959
Linear Mixed Models: Yu et al (2006)
http://www.nature.com/ng/journal/v38/n2/full/ng1702.html \[Y=XB + U + E,~~U\sim (0, \sigma_g^2 K)\]
For more information, see Zhou, Carbonetto and Stephens (2013)
http://dx.doi.org/10.1371/journal.pgen.1003264
(Source: http://www.mdpi.com/2073-4425/5/2/270)
PrediXcan (2015):
http://www.nature.com/ng/journal/v47/n9/full/ng.3367.html
TWAS (2016):
http://www.nature.com/ng/journal/v48/n3/full/ng.3506.html
Two types of GWAS data:
Pasaniuc & Price (2016): http://www.nature.com/nrg/journal/vaop/ncurrent/full/nrg.2016.142.html
Page 4-12 of Alkes Price’s Slides (ASHG, 2015)
Single-SNP test statistic can be “inflated” due to:
Expected \(\chi^2\) stat = Slope \(\cdot\) LD score + Confounding biases
We only need a likelihood based on summary data:
For more details, see Zhu and Stephens (2016+)
http://dx.doi.org/10.1101/042457 Or, ask me (CLSC, Room 412)
The same unit of observation \(\leadsto\) human genome
Examples based on individual-level data (skipped):
Veyrieras et al (2008)
http://dx.doi.org/10.1371/journal.pgen.1000214
Carbonetto and Stephens (2013)
http://dx.doi.org/10.1371/journal.pgen.1003770
He et al (2013): GWAS + eQTL
http://www.cell.com/ajhg/abstract/S0002-9297(13)00159-6
Pickrell (2014): GWAS + Functional annotations
http://www.cell.com/ajhg/abstract/S0002-9297(14)00106-2