Overview of MQLS

MQLS is a program, written in C, for case-control association testing 
of a binary trait in samples that contain related individuals.  
The program allows for testing association of the trait with any number 
of binary or multiallelic markers (e.g. from a genomewide screen), where 
separate tests are performed at each marker. The program is applicable to 
association studies with completely general combinations of related and 
unrelated individuals, where the relationships among the sampled individuals
are assumed to be known. For instance, the program allows cases to be related 
to controls, and it is equally applicable to complex inbred pedigrees and to
simpler study designs consisting of unrelated individuals and small outbred 
families. 

The main reference for this program is Thornton T., McPeek M. S. 
"Case-Control Association Testing with Related Individuals: A More Powerful 
Quasi-Likelihood Score Test" (2007) American Journal of Human Genetics, 
vol 81, pp. 321-337.  

The MQLS program can be considered as a significantly enhanced version
of the CC-QLS program of Bourgain C., Hoffjan S., Nicolae R., Newman
D., Steiner L., Walker K. Reynolds R., Ober C., McPeek M. S. 
"Novel case-control test in a founder population identifies P-selectin as 
an atopy susceptibility locus" (2003) American Journal of Human Genetics 
vol 73,pp. 612-626.

For each marker, the MQLS program computes 3 different test 
statistics for association: the MQLS test statistic
of Thornton and McPeek (2007), the WQLS test statistic of Bourgain et
al. (2003), and the corrected chi-square test statistic of Bourgain et
al. (2003).  As a default, we recommend using the MQLS test.
The MQLS is a quasi-likelihood score test that was developed to improve the 
power of the WQLS test. (The "M" in MQLS stands for "more powerful"  or 
"modified.").  The MQLS test improves power over the WQLS
by taking advantage of the principle that there is enrichment for 
predisposing variants in affected individuals with affected relatives.  
For a more detailed comparison of the 3 statistics, see Thornton and 
McPeek (2007).  The current release of the MQLS software, version 1.5, 
calculates all three association statistics using the variance estimator
 of Equation (3) of Thornton and McPeek (2010), which is a robust 
variance estimator that relaxes the Hardy-Weinberg Equilibrium (HWE) 
assumption under the null hypothesis. For each test, a p-value is 
calculated based on the chi-square asymptotic null distribution.


To calculate the MQLS statistic, an estimate of the population prevalence 
of the trait must be specified by the user.  We emphasize that the test
will be valid regardless of the input value. We recommend using an estimate
from previous studies or registry data from the population.   We have 
demonstrated, through simulation (see Thornton and McPeek (2007)), that 
power of the MQLS statistic is in fact quite robust to misspecification of 
the population prevalence.

Additional features of the MQLS test include:

(1) The MQLS test for a given marker incorporates information on phenotyped
individuals who have missing genotype data at the given marker.  This
information is used to optimize the weights given to relatives with
non-missing genotype data at the marker being tested, following the
principle that there is enrichment for predisposing variants in
individuals with affected relatives.  This enrichment principle implies,
for example, that an affected individual with no phenotyped relatives should
be weighted differently from an affected individual with an affected sibling,
and that this should still hold true when the affected sibling happens to have
missing genotype data at the marker being tested.  At the same time, the
genotypes of the two sibs are dependent, so there should be downweighting of
the sibs when they are both genotyped which does not occur when only one
is typed.  The MQLS test takes into account both the enrichment principle 
and the effects of dependence in setting the weights.  In contrast, the 
WQLS and corrected chi-squared test statistics will exclude individuals 
with missing genotype data at the given marker.

(2) Another useful feature of the MQLS test is that it allows individuals' 
phenotypes to be coded as "affected", "unaffected", or "unknown."  An
individual's phenotype is appropriately coded as "unknown" if no direct
phenotype information was measured on the individual.  One situation in
which the "unknown" phenotype designation is appropriate is for general 
population controls.  "General population controls" refers to a set of control
individuals, sampled from some population, who have not been screened for the 
phenotype.  Another situation in which the "unknown" designation is 
appropriate is when the trait of interest is a late onset disease 
(e.g., Alzheimer's). There may be individuals that are not affected with a 
trait because they are too young to be affected at the time of screening,
but they may develop that trait later on in life.  These individuals could be
appropriately considered to have unknown phenotype.  Individuals with unknown
phenotype and unknown genotype for a given marker are not included in
any test for that marker.  Genotyped individuals with unknown phenotype
are included in the MQLS test (using Option 1 of the software), and 
their weight in the analysis is determined by a combination of the population
prevalence of the trait (as input by the user) and by the phenotypes of any
relatives they have in the study. In contrast, the WQLS and Corrected 
Chi-squared statistics do not make a distinction between unaffected 
individuals and individuals of unknown phenotype.  

The MQLS software gives the user TWO OPTIONS for how to handle the individuals
of unknown phenotype.

	OPTION 1:  This should be considered the default for the MQLS
test.  Under this option, the MQLS test is performed with 3 
different phenotype categories allowed: affected, unaffected, and unknown.  
Furthermore, phenotyped individuals with missing genotype data are allowed to 
contribute to the MQLS test (if they have genotyped relatives in the sample). 
The WQLS and corrected chi-squared statistics are computed with the cases 
taken to be the affecteds and the controls taken to be the unknown and 
unaffected individuals combined.  They do not make use of individuals 
with missing genotype data at the tested marker.

	OPTION 2:  This option is provided for backward compatibility 
with the CC-QLS software for calculating WQLS and corrected chi-squared.
In this option, individuals with unknown phenotype are excluded from 
all tests, and individuals with missing genotype data at a given marker
are excluded from the test at that marker.  If this option is run, 
results for WQLS and corrected chi-squared will be consistent with the 
output of the CC-QLS software (provided that there are no MZ twin pairs 
in the sample --- see below).  Under option 2, the MQLS test will also 
be performed with these individuals removed from the analysis, which 
could reduce its power.

(3) The original versions of the WQLS and corrected chi-squared tests and
their implementations in the CC-QLS software (Bourgain et al. 2003)
did not allow both members of an MZ twin pair to be included in the 
analysis.  We have made changes that allow both members of one or more
MZ twin pairs to be included in all 3 tests: MQLS, WQLS, and
corrected chi-squared.  These changes are described in Thornton and
McPeek (2007) and are implemented in the MQLS program.