Tips
1. Input
READ "MQLS_Input" DOCUMENT CAREFULLY.
The program will stop if any errors are detected in the format of the
marker data file or in the format of the kinship coefficient file.
(e.g. missing kinship coefficient values...)
Please read the "MQLS_Input" document carefully and make sure the input files
are in the correct format and have concordant information.
2. Significance of the tests
It has been verified (by simulation studies) that the use of the chi squared approximation
to the null distributions of the MQLS, WQLS, and corrected chi-squared statistics give the
appropriate type I error. However, when the number of alleles of a given type is small in
the case and/or in the control samples, the chi squared approximation may no longer be valid.
The program will issue a warning in this case. For low allele counts empirical p-values obtained
via simulation (e.g. parametric bootstrap) could be used, but this is not currently implemented in the
software.
3. Allele frequency estimation
This MQLS program also provides allele frequency estimates using the best linear
unbiased estimator (BLUE) given by McPeek, Wu and Ober (2004) and the naive estimator.
Estimates are given for (1) the case sample only, (2) the control sample only, and
(3) the entire sample. If OPTION 1 is chosen by the user, then individuals with unknown
phenotypes are included in the allele frequency estimates for the controls and for the entire
sample. If OPTION 2 is chosen by the user, individuals with unknown phenotypes are not used in
any of the estimates.
It may occasionally happen, with a very low allele count and certain
patterns of missing genotype data, that the BLUE gives negative allele frequency estimates. If the
BLUE for the entire sample is negative, the MQLS and WQLS computations are skipped. This is because the
variance calculations for these statistics depend on the BLUE for the entire sample. For this situation,
one could use naive counting estimates and the corrected chi-squared statistic. Keep in mind, though, that the
chi-squared approximation may not be accurate for any of the statistics when there is a very low allele count.
4. MQLS, WQLS, or corrected Chi-squared?
Thornton and McPeek (2007) show that the MQLS is generally more powerful than the WQLS and the
corrected chi-squared tests. However, there are situations when the MQLS will not be more
powerful, e.g., the WQLS is optimal for a dominant fully penetrant 2-allele disease model.
A simple diagnostic for when the test statistics are expected to give different results can
be found in Thornton and McPeek (2007). We hope to implement this diagnostic in the next version
of the software.
5. Small P-values
MQLS outputs test statistics for each of the 3 tests. Then a chi-squared routine is used to convert the test
statistics to p-values, which are also output. While the calculated test statistics are highly accurate, the
p-values for the 3 statistics are only accurate for p-values larger than 2.0e-09. For any test statistic that
has a p-value less than 2.0e-09, the algorithm will report a p-value of 0. To get a more accurate p-value, one
could simply plug the test statistic value into the pchisq() function in the R software. For instance, if the
test statistic value were 44.56, then one could use the following command in R:
1-pchisq(44.56,1)
which yields the p-value
[1] 2.466805e-11
6. Calculating MQLS, WQLS, and corrected Chi-squared using a variance estimator that assumes HWE in the founders
The current release of the MQLS software, version 1.5, calculates all three association statistics using the
variance estimator of Equation (3) of Thornton and McPeek (2010), which is a robust variance estimator that
relaxes the Hardy-Weinberg Equilibrium (HWE) assumption under the null hypothesis.