Tips

1.  Input

    READ "MQLS_Input" DOCUMENT CAREFULLY.
 
    The program will stop if any errors are detected in the format of the
    marker data file or in the format of the kinship coefficient file.
    (e.g. missing kinship coefficient values...)  
    Please read the "MQLS_Input" document carefully and make sure the input files
    are in the correct format and have concordant information.


2. Significance of the tests

   It has been verified (by simulation studies) that the use of the chi squared approximation
   to the null distributions of the MQLS, WQLS, and corrected chi-squared statistics give the
   appropriate type I error.  However, when the number of alleles of a given type is small in
   the case and/or in the control samples, the chi squared approximation may no longer be valid. 
   The program will issue a warning in this case. For low allele counts empirical p-values obtained 
   via simulation (e.g. parametric bootstrap) could be used, but this is not currently implemented in the
   software.


3.  Allele frequency estimation

   This MQLS program also provides allele frequency estimates using the best linear
   unbiased estimator (BLUE) given by McPeek, Wu and Ober (2004) and the naive estimator.
   Estimates are given for (1) the case sample only, (2) the control sample only, and
   (3) the entire sample.  If OPTION 1 is chosen by the user, then individuals with unknown
   phenotypes are included in the allele frequency estimates for the controls and for the entire 
   sample.  If OPTION 2 is chosen by the user, individuals with unknown phenotypes are not used in 
   any of the estimates.

   It may occasionally happen, with a very low allele count and certain 
   patterns of missing genotype data, that the BLUE gives negative allele frequency estimates. If the
   BLUE for the entire sample is negative, the MQLS and WQLS computations are skipped.  This is because the 
   variance calculations for these statistics depend on the BLUE for the entire sample.  For this situation,
   one could use naive counting estimates and the corrected chi-squared statistic.  Keep in mind, though, that the
   chi-squared approximation may not be accurate for any of the statistics when there is a very low allele count.


4. MQLS, WQLS, or corrected Chi-squared?

  Thornton and McPeek (2007) show that the MQLS is generally more powerful than the WQLS and the
  corrected chi-squared tests.  However, there are situations when the MQLS will not be more 
  powerful, e.g., the WQLS is optimal for a dominant fully penetrant 2-allele disease model.
  A simple diagnostic for when the test statistics are expected to give different results can
  be found in Thornton and McPeek (2007).  We hope to implement this diagnostic in the next version
  of the software.  


5. Small P-values

MQLS outputs test statistics for each of the 3 tests.  Then a chi-squared routine is used to convert the test 
statistics to p-values, which are also output. While the calculated test statistics are highly accurate, the 
p-values for the 3 statistics are only accurate for p-values larger than 2.0e-09. For any test statistic that 
has a p-value less than 2.0e-09, the algorithm will report a p-value of 0. To get a more accurate p-value, one 
could simply plug the test statistic value into the pchisq() function in the R software.  For instance, if the 
test statistic value were 44.56, then one could use the following command in R:                 

1-pchisq(44.56,1)

which yields the p-value

[1] 2.466805e-11


6. Calculating MQLS, WQLS, and corrected Chi-squared using a variance estimator that assumes HWE in the founders

The current release of the MQLS software, version 1.5, calculates all three association statistics using the 
variance estimator of Equation (3) of Thornton and McPeek (2010), which is a robust variance estimator that 
relaxes the Hardy-Weinberg Equilibrium (HWE) assumption under the null hypothesis.