Tips

1.  READ "Input" DOCUMENT CAREFULLY.

    The program will stop if any errors are detected in the format
    of the datafile or the pedfile. So read the "Input" document
    closely and make sure the input files are formatted correctly.

2.  The software will allow the user to input allele frequencies
    rather than a set of control haplotypes or genotypes.  However,
    this is almost never recommended for analysis of real data. The
    reason is that with only allele frequencies available, the
    software is forced to use the assumption of no background linkage
    disequilibrium.  In our experience, this assumption rarely holds
    in practice and can lead to misleading results when background LD
    is present. This feature is preserved for analyzing simulated or
    real data in which there is believed to be no background LD.

3.  We now allow the user to specify the order of the Markov chain
    used to model background LD. For microsatellite data, 1 is often
    satisfactory. For biallelic markers, provided that genotypes
    are available for a sufficient number of control individuals (>100),
    a 2nd order chain may be more appropriate.

4.  Set the "Bayesian adjustment" parameter to 1. This ensures that
    all estimated control haplotype frequencies are positive, a
    condition necessary for the software to run properly. (See 
    Strahs 2001 for details.)

5.  Set the parameter called "max_cand" (See "Search Procedures") to
    be at least 20. This integer is the number of candidate ancestral
    haplotypes allowed to continue to grow  as each additional marker
    is added in the search procedure described in "Search
    Procedures". The larger this number, the more confident you should
    be that the maximum likelihood  ancestral haplotype has been
    found. Be advised that running time is linear in this parameter,
    i.e., using max_cand=40 takes twice as long as max_cand=20.

6.  Set "max_res" such that the ancestral haplotype is estimated 2-3
    times in the largest interval; it is not generally necessary to
    estimate this parameter more frequently, because the maximum
    likelihood ancestral haplotype is not likely to change more than
    once between each pair of markers. Note that computation time is
    linear in the number of times the ancestral haplotype is estimated.

7.  Set "map_res" to be at least 10; for higher values of map_res, 
    the point estimate and confidence intervals are more accurate and the 
    plot of log-likelihood vs. location is smoother. 

Also Note

8.  Use of Physical Maps

    DHSMAP assumes a genetic marker map.  However, marker distances
    are often available in the form of a physical map.  This suggests
    two possible approaches: (1) input physical distances instead of
    genetic distances or (2) first convert physical distances to
    genetic distances,  input the genetic distances, then convert the
    results back to physical  distances.  As long as there is a
    constant conversion between cM and Mb in the region and mutation
    rates are set to 0, the two approaches will yield identical
    results, i.e. it is not necessary to know the conversion factor
    between genetic and physical distance in that case.  However,
    when the model includes mutation (or if genetic distance is not
    assumed to be a fixed multiple of physical distance), approach (1)
    is incorrect and only approach (2) should be used.  In that case
    an appropriate conversion between genetic and physical distances
    is needed.

9.  Specification of E_int

    If you know the approximate location of the trait-associated
    variant,  you may specify E_int, skipping the first stage of the
    search procedure.  If DHSMAP estimates the location outside this
    interval, we recommend  rerunning the program without specifying
    E_int, i.e., set E_int=0. (See "Search Procedures" for more on
    E_int.)

10. Ancestral Haplotype Known

    DHSMAP generally spends a large percentage of its running time
    estimating the ancestral haplotypes. If you know the ancestral
    haplotype, if there are multiple ancestral haplotypes and you know
    all of them, or if you have a set of candidate ancestral haplotypes
    over which you wish to maximize (e.g., when you had already
    performed a similar analysis), you can save time by setting
    "anc_hap_known"=1 and specifying the set S of ancestral haplotypes
    over which DHSMAP will maximize the likelihood in the third stage
    of the search procedure (See "Search Procedures" for details).
    This option should be used with caution; incorrectly specifying
    the MLE ancestral haplotypes can bias the other parameter
    estimates and CIs.