Output

DHSMAP output consists of five files, including "dhsmap_errors" and four 
files named by the user in the datafile. They are introduced here using 
the output from the sample input files given in "input.txt":

1.  "dhsmap_errors" file

    This file contains all error and warning messages (if there are any).  
    It includes errors detected in the formats of the input files and errors
    and warnings triggered while running the software. The program will 
    stop immediately after an error is detected but will continue after a 
    warning. In the given example, there are no errors or warnings; the 
    file dhsmap_errors is empty.

2.  "resout_ex" file

    This file contains the point estimates and 95% confidence intervals for
    the location of the trait-associated variant for the cases in which
    (1) a star-shaped genealogy is assumed and (2) a conditional-coalescent
    genealogy is assumed. The results are given in terms of a genetic 
    map (cM) where the first marker (as listed in datafile) is assigned 
    location 0. The following is resout_ex:  
    
        Point Estimate for location of trait-associated variant:  0.49524
        95% CI [Star-shaped genealogy]: (0.35238, 0.58571)
        95% CI [Cond. Coalescent]: (0.02381, 0.73810)

3.  "ancout_ex" file

    As described in "Search Procedures", DHSMAP uses a three-stage
    method to search over ancestral haplotype and variant location.
    Each stage produces or uses a list of ancestral haplotypes. This
    file contains these three haplotype lists.

    The first and second lists give the ancestral haplotypes estimated
    in the first two stages of the procedure. Each row corresponds to
    an estimate of the ancestral haplotype for a putative location of
    the trait-associated variant; the ancestral haplotype may be
    estimated for several variant locations between each adjacent pair
    of markers. The first entry x of each row, enclosed by parentheses
    to distinguish it from the haplotype that follows, reports the
    interval in which the trait-associated variant is assumed to reside;
    the variant lies between markers x and x+1, where the markers are
    in map order and the marker at map position 0 is labeled 1.
    Consecutive rows listing identical intervals correspond to
    estimates assuming different positions of the variant within the
    interval.  The label 9 is assigned to the variant and is inserted
    in its assumed position.  Each list is preceded by the value of
    the parameter E_int, the interval around which the ancestral
    haplotypes are grown.

    The third list gives the haplotypes in the set S. The first entry
    in each row is an integer label for this haplotype. This entry is
    enclosed in parentheses to set it apart from the haplotype it
    denotes.  (Note that these haplotypes do not list the variant.)
    
    The first, or first and second, stages of the search procedure may
    be bypassed. In that event, the corresponding lists do not appear
    in this file.
    
    The following are several lines from ancout_ex: (original may be
    downloaded)

        [Stage 1] E_int=  0
        (   1)    1    9    2    1    1    2    1 
        (   1)    1    9    2    1    1    2    1 
        (   2)    1    2    9    1    1    2    1 
        (   2)    1    2    9    1    1    2    1 
        (   3)    1    2    1    9    1    2    1 
        ...
        [Stage 2] E_int=  3
        (   1)    1    9    2    1    1    2    1 
        (   1)    1    9    2    1    1    2    1 
        (   2)    1    2    9    1    1    2    1 
        (   2)    1    2    9    1    1    2    1 
        (   3)    1    2    1    9    1    2    1 
        ...
        [Stage 3] Set S of ancestral haplotypes
        ( 0)    1    2    1    1    2    1 

4.  "maxout_ex" file

    This file contains the parameter estimates and diagnostic statistics
    corresponding to each estimate of the ancestral haplotype reported
    in the previous file (from first and second stages, as described
    in "Search Procedures"). The following is a line from maxout_ex 
    (original may be downloaded):

        ind C         1/Tau   p       s-m.lik    s-n.lik     lloc     rloc   \
        cloc    iter
        ...
        1   0.0666667 1.29365 0.37254 -122.84735 -165.67497  23.307  12.152  \
        25.099  17
        ...

	"ind"     identifies the interval in which the
                  trait-associated variant is assumed to lie; the
                  variant lies between markers ind and ind+1, where
                  the markers are in map order and the marker at map
                  position 0 is labeled 1 (in this case, the variant
                  is assumed to lie between the first two markers)
	"C"       gives the assumed location of the trait-associated variant 
                  on a genetic map (cM) in which the first marker is assigned 
                  location 0.
	"1/Tau"   gives the estimate of 1/tau given the estimated ancestral
		  haplotype and the assumed location of the trait-associated
                  variant.
        "p"       gives the estimate of the heterogeneity parameter p
		  given the estimated ancestral haplotype and the assumed 
		  location of the trait-associated variant.
        "s-m.lik" is the log-likelihood evaluated at the given parameter
                  values, assuming independence of the recombinational 
                  histories, i.e., a star-shaped genealogy. Note that we
                  recommend using the more conservative quasi-likelihood
		  assuming a conditional coalescent model, as given in 
                  "oneout_ex" 
	"s-n.lik" is the log-likelihood evaluated under the null
                  model, i.e. the model with p=1
        "lloc"    is the expected value of the number of affected haplotypes 
		  still sharing from the ancestral haplotype, conditional 
		  on the model and the data, at location 0
        "rloc"    is the expected value of the number of affected haplotypes 
		  still sharing from the ancestral haplotype, conditional 
		  on the model and the data, at the marker farthest from
                  from location 0
	"cloc"	  is the expected value of the number of affected
                  haplotypes sharing the variant by descent from the
                  ancestral haplotype, conditional on the model and
                  the data; equal to  (1-p) * (number of haps)
	"iter"    is the number of the iterations of the HMM/EM that
                  were performed before the algorithm was determined
                  to have converged.  (The maximum number of
                  iterations is arbitrarily set to 200 for haplotype
                  data and 100 for genotype data but may easily be
                  changed by the user.  If the maximum value is
                  reported, this may indicate that  the algorithm has
                  not converged.)

5.  "oneout_ex" file

    This file contains the results and diagnostics from the estimation
    of the location of the trait-associated variant on a fine grid
    (third stage, as described in "Search Procedures"), in which the
    likelihood is maximized over  1/tau, p, and the set S of ancestral
    haplotypes given in  "ancout_ex". The following is a line from
    oneout_ex (original may be downloaded):

        ind C         1/Tau   p       s-m.lik    s-n.lik    \
        c-m.lik   c-n.lik    lloc   rloc   cloc   iter anchap
        1   0.0047619 1.44979 0.39786 -123.80748 -165.67497 \
        -35.41986 -47.39766  23.939 11.731 24.085 14   0
        ...

	"c-m.lik" is the log-quasi-likelihood evaluated at the given
                  parameter values assuming dependence between the
		  haplotypes' recombinational histories, i.e., a conditional 
		  coalescent genealogy.
	"c-n.lik" is the log-quasi-likelihood evaluated under the null 
                  model, i.e. the model with p=1
	"anchap"  identifies the ancestral haplotype assumed by the
		  model; this ID matches the first column in the list
		  of candidates at the end of "ancout_ex"

	All other entries are as defined previously. 

	The likelihood surface can be viewed by plotting "c-m.lik"
	vs "C".