Search Procedures This file describes the search procedure for likelihood/quasi-likelihood maximization as well as the search parameters max_res, E_int, max_cand and map_res. (Advice about setting these parameters is given in "Tips".) To implement the DHS method for LD mapping, we seek to maximize the likelihood (or quasi-likelihood) over 1/tau, p, ancestral haplotype, and variant location, simultaneously. For a given ancestral haplotype and variant location, we maximize the likelihood over 1/tau and p using the Baum/E-M algorithm in a hidden Markov framework, as described in McPeek and Strahs (1999). To maximize over all parameters simultaneously, we implement a directed search over ancestral haplotype and variant location, maximizing the likelihood over 1/tau and p for each combination, and choosing the set of parameters for which the likelihood is highest. The search over variant location is straightforward, as the maximized likelihood and maximizing parameter values change sufficiently smoothly with location to make a grid search feasible. Thus, our search strategies focus on the problem of searching over ancestral haplotype. Note that the number of possible ancestral haplotypes would be m^n for n loci each with m alleles. Thus, an exhaustive search quickly becomes infeasible as n grows. We currently implement a three-stage search procedure that we find performs well in practice. It is based on the following observations: (1) Ancestral haplotype estimation is generally much easier around the peak of the likelihood curve (i.e. when the parameter representing variant location is close to its maximizing value) than in an area of very low likelihood (i.e. when the parameter representing variant location is set to a value for which the likelihood will be low when maximized over the other parameters) and (2) The set of best ancestral haplotypes across different locations of the variant is generally quite small. A central strategy of the first two stages of our approach to ancestral haplotype estimation is the idea of growing the haplotype out from a given location. That is, we fix a site, and consider all 2-locus haplotypes for the 2 markers flanking that site. We rank them by log-likelihood and keep the best "max_cand" (as in "maximum number of candidate ancestral haplotypes"). Then we add the next-nearest marker and consider all possible haplotypes obtained by combining any of the best max_cand haplotypes at the first 2 markers with any allele at the 3rd marker, and we keep the best max_cand of those, and so on. At the last step, we take the best haplotype from among those obtained by combining any of the best max_cand haplotypes at the first n-1 markers with any allele at the nth marker. We call the above procedure "growing the haplotype" from the given site. In the first stage of our three-stage approach, we put the variant at each position on a coarse grid. The points of the grid are determined by the marker map and the parameter "max_res" (given in cM) as follows: between markers l and l+1, there are s(d/(max_res)) evenly spaced points, where d is the distance (cM) between markers l and l+1 and s(x) is the smallest integer greater than x for any real x. (Note on terminology: max_res can be thought of as defining an upper bound on the distance between grid points or a lower bound on resolution, so might be more aptly called "min_res".) At each point of the grid, we perform the above haplotype-growing procedure, in each case growing the haplotype from the putative position of the variant. From this, we obtain, for each position of the variant, an estimated ancestral haplotype and a corresponding log-likelihood. Let t be the position of the variant for which the corresponding log-likelihood is the largest. In the second stage, we again put the variant at each position on the coarse grid and perform the above haplotype-growing procedure, but this time we always grow the haplotype at the fixed location t, instead of from the putative variant position. If the approximate location of the trait-associated variant is known, the user may instead specify the interval ("E_int") around the midpoint of which the haplotype should be grown. From this, we obtain, for each position of the variant, a second estimated ancestral haplotype. In the third step, we define the set S to consist of all ancestral haplotypes estimated in the first or second steps. Then, we put the variant at each position on a fine grid and maximize the likelihood over S for each position of the variant. Alternatively, the user can specify the set S of ancestral haplotypes over which to maximize, bypassing stages 1 and 2 of the procedure. The fine grid of locations used in the last step is determined by the user and depends on max_res and "map_res" ("mapping resolution"). Between markers l and l+1, DHSMAP maximizes the likelihood for s(d/(max_res))*[map_res] variant locations. For example, if the distance between 2 markers is 0.25 cM, max_res=0.2 cM, and map_res=20, DHSMAP will maximize the likelihood at 2*20=40 locations between the markers. As a result of stage 3, we obtain the maximized likelihood and maximizing parameter values for every putative location of the variant on the fine grid. These are used for estimation and confidence intervals for location of the variant, as well as for estimation of the amount of LD and degree of heterogeneity. In the examples we have considered, including some very complex cases, we have found this procedure to work well.