Input

Required Input Files:

1.  phenotype data file (default filename is "pedigree")

    This file contains the pedigree and phenotype information. 
    Individuals who are not listed in this file will not be included 
    in the analysis.
   

       1   1   7   6   1   1
       1   2   7   6   2   2   
       1   3   7   6   1   2   
       1   7   0   0   1   1
       1   6   0   0   2   0
       2  11  18  19   2   2 
       2  12  18  19   1   1 
       2  18   0   0   1   0 
       2  19  15  16   2   1
       2  15   0   0   1   0
       2  16   0   0   2   0
      (1) (2) (3) (4) (5) (6)
  
     (1) family ID (positive integer) 
     (2) individual ID (positive integer; must be unique)
     (3) father's ID (0 if the individual is a founder)
     (4) mother's ID (0 if the individual is a founder)
     (5) sex (1=male, 2=female) 
     (6) affection status (0=unknown, 1=unaffected, 2=affected) 

    Sampled individuals who are unrelated to anyone else in the sample 
    should be included by giving each such person their own unique family 
    ID (as well as unique individual ID) and setting both parents' IDs to 0.
    There is no limit on the number of individuals nor on the number of 
    families.  Each individual should be entered only once.  The 
    individual ID is required to be unique (e.g. it cannot be reused in a 
    different family).  Individuals from the same family should appear in 
    a single cluster, though there is no requirement on the order of
    individuals within a family nor on the order in which different
    families are listed. 

    The default filename is "pedigree".  To specify a different filename,
    use the command-line flag -pheno followed by the filename.  For 
    example, to use a phenotype data file called "myphenofile", you could 
    type the command

	./ATRIUM -pheno myphenofile 



2.  marker data file (default filename is "markid")

    This file contains the marker data.  All markers should be on the same 
    chromosome. (To analyze more than one chromosome, a separate run must be 
    performed for each chromosome, with each chromosome having its own 
    marker datafile.)
    
     marker  chromosome position orientation allele0 allele1   1   2   3   7   6  11  12  18  19  15  16
    rs7909677   10       101955      +          A       G     AG  AA  AA  AA  AG  AG  GG  GG  AG  AG  AG
    rs9419560   10       142201      +          A       G     AA  GG  AG  AG  AG  AG  NN  GG  AG  AA  GG
    rs9419419   10       153707      -          T       C     TC  TC  CC  TC  TC  TT  TC  TT  TC  TT  CC
       (1)      (2)        (3)      (4)        (5)     (6)    (7) (8) (9)(10)(11)(12)(13)(14)(15)(16)(17)
  
     (1) marker rs number 
     (2) chromosome 
     (3) physical position
     (4) strand orientation ("+"=same strand as HapMap, "-"=opposite 
     strand from HapMap)
     (5) nucleotide for allele 0 
     (6) nucleotide for allele 1 
     (7)... marker genotypes (NN for missing genotype)

    The first row of the file must contain the column headings.  The
    headings for the first 6 columns can be arbitrary, but should not
    contain any space characters.  Columns 7 and beyond contain marker
    genotype data for the sampled individuals, and each of these 
    columns must have the corresponding individual's ID number as the
    heading.  The order of the individuals is not required to be the
    same as the order in the pedigree file.  The column headings must
    specify the order.  There is no limit on the number of markers.  
    However, all markers should be on the same chromosome.  All 
    individuals in the marker data file should also appear in the 
    phenotype data file, otherwise, they will not be included in the 
    analysis.   
   
    The number of columns should be the same for every marker: 
    Use NN for missing genotype.

    The default filename is "markid".  To specify a different filename,
    use the command-line flag -geno followed by the filename.  For 
    example, to use a marker data file called "mymarkfile" you could 
    type the command

	./ATRIUM -geno mymarkfile 

 
3.  The IBD coefficient file (default filename is "ibdcoef")

    This file contains condensed identity coefficients for every pair
    of eligible individuals within each family (including an individual
    with himself/herself), where an individual is eligible if he or she 
    has either (1) known affection status or (2) non-missing genotype 
    for at least one marker.  (E.g. an individual with unknown 
    phenotype is still eligible if he or she has any non-missing 
    genotype information.)

    IBD coefficients should be included for every pair of eligible 
    individuals who have the same family ID (including each individual
    with himself/herself).  A sampled individual who does not share
    a family ID with anyone else in the sample, would be represented in 
    the markid file by a single line that gives the IBD coefficients 
    for the person with himself/herself. 
   
    The IBD coefficient file has the following format:
   
    1    1    0    0    0    0    0    0    1    0    0
    1    2    0    0    0    0    0    0    0.25 0.5  0.25
    1    3    0    0    0    0    0    0    0.25 0.5  0.25
    1    7    0    0    0    0    0    0    0    1    0
    1    6    0    0    0    0    0    0    0    1    0
    2    2    0    0    0    0    0    0    1    0    0
    2    3    0    0    0    0    0    0    0.25 0.5  0.25
    2    7    0    0    0    0    0    0    0    1    0
    2    6    0    0    0    0    0    0    0    1    0
    3    3    0    0    0    0    0    0    1    0    0
    3    7    0    0    0    0    0    0    0    1    0
    3    6    0    0    0    0    0    0    0    1    0
    7    7    0    0    0    0    0    0    1    0    0
    7    6    0    0    0    0    0    0    0    0    1
    6    6    0    0    0    0    0    0    1    0    0
   11   11    0    0    0    0    0    0    1    0    0 
   11   12    0    0    0    0    0    0    0.25 0.5  0.25
    .    .    .    .    .    .    .    .    .    .    .
    .    .    .    .    .    .    .    .    .    .    .
   (1)  (2)  (3)  (4)  (5)  (6)  (7)  (8)  (9)  (10) (11)

   (1) individual 1 ID 
   (2) individual 2 ID 
   (3)..(11) condensed identity coefficients 1 through 9 between 
    individuals 1 and 2 (Ken Lange's book, Mathematical and Statistical
    Methods for Genetic Analysis, has a good description of condensed 
    identity coefficients)
   
   Note that ATRIUM currently permits only outbred individuals in the
   analysis.  For a pair of outbred individuals, columns (3)-(8) will
   always be 0, and columns (9), (10) and (11) represent the 
   probabilities of sharing 2, 1 or 0 alleles IBD, respectively.  If,
   for any pair of individuals, at least one of the values in columns 
   (3)-(8) is not zero, these individuals will be excluded from the 
   analysis.  The individual IDs in this file should correspond to the 
   individual ID's used in the phenotype data file.  

    
    The software program that can be used to obtain IBD coefficients is

    -- The IdCoefs software by Mark Abney, which can be found at 
	http://home.uchicago.edu/~abney/Software.html

    The IdCoefs software computes condensed identity coefficients for 
    pairs of individuals within each family.  The output of IdCoefs can 
    be directly used as input to ATRIUM.

    The default filename is "ibdcoef".  To specify a different filename,
    use the command-line flag -ibd followed by the filename.  For 
    example, to use an IBD coefficient file called "myibdfile" you could 
    type the command

	./ATRIUM -ibd myibdfile


4.  The multilocus LD database file (default filename is "database")

    This file contains information on the joint distribution of untyped 
    SNPs with their tag SNPs in the reference panel.  The software 
    program that can be used to obtain the multilocus LD database file is

    -- The tuna_db program of the TUNA package by William Wen and Dan 
    Nicolae.  The output of tuna_db has the exact format required for the 
    ATRIUM software.
  
    The tuna_db program can be found at  
    http://www.stat.uchicago.edu/~wen/tuna

    The resulting multilocus LD database file has the following format:

      rs7909677   1       101955  A G   0.3508  0.1257    0.3508  4  rs2060138:rs4881551:rs3125023:rs1476130  0:18:26:0.6923_1:0:2:0.0000_2:2:2:1.0000_3:3:3:1.0000_4:1:1:1.0000_5:19:19:1.0000_7:22:22:1.0000_8:1:1:1.0000_9:3:3:1.0000_10:1:1:1.0000_11:37:37:1.0000_12:2:2:1.0000_15:1:1:1.0000
     rs11591988   0       116070  C T   0.1846  0.1685    0.1685  1  rs10794885  0:76:77:0.9870_1:31:43:0.7209
      rs2379071   0       116237  A G   0.4012  0.2414    0.3699  2  rs9419560:rs2060138  0:9:30:0.3000_1:3:3:1.0000_2:2:86:0.0233_3:1:1:1.0000
     rs12773042   0       117636  C G   0.2642  0.1493    0.2521  3  rs2060138:rs4881551:rs4880568  0:0:3:0.0000_1:0:19:0.0000_3:0:26:0.0000_4:9:27:0.3333_5:2:5:0.4000_6:0:3:0.0000_7:0:37:0.0000
        .         .         .     . .     .       .         .     .           .                         .
        .         .         .     . .     .       .         .     .           .                         .
       (1)       (2)       (3)   (4)(5)  (6)     (7)       (8)   (9)         (10)                      (11)

     (1) marker rs number 
     (2) typed or not (0=untyped, 1=typed) 
     (3) physical position
     (4) nucleotide for allele 0 
     (5) nucleotide for allele 1 
     (6) maximum multilocus LD measure M_D
     (7) maximum pairwise LD measure r^2
     (8) multilocus LD measure M_D with tag SNPs listed in column (10)
     (9) number of tag SNPs
    (10) list of tag SNPs
    (11) information on the joint distribution of untyped SNPs with their 
     tag SNPs in the reference panel

    Detailed explanation of column (11):  

    For a given untyped SNP, column (11) is divided into h subfields, 
    where h is the number of tag SNP haplotypes, for the given untyped 
    SNP, that occur in the reference panel, and where the subfields 
    are separated by underscores ("_").  

    Each subfield is further separated into 4 entries, where the entries 
    are separated by colons (":").  The first 3 entries must be
    integers, and the 4th entry is a double-precision number.

    For a given untyped SNP, each tag SNP haplotype that occurs in the 
    reference panel is coded as a nonnegative integer, corresponding to 
    a binary representation.  For example, tag SNP haplotype 0000 is 
    coded as 0, 1000 is coded as 1, 0100 is coded as 2, 1100 is coded as 
    3, etc., where the order of the tag SNPs is the same as in (10).  
    Each subfield corresponds to a tag SNP haplotype, and the haplotype 
    code must be the first entry in the subfield.  E.g. if tag SNP 
    haplotype 0000 occurs in the reference panel, then, in its subfield,
    the first entry would be 0.  Similarly, if tag SNP haplotype 1000 
    occurs in the reference panel, then, in its subfield, the first entry 
    would be 1.  If a haplotype does not appear in the reference panel, 
    then there should be no subfield for that haplotype.

    The second entry in the subfield corresponding to tag SNP haplotype H 
    is the count of haplotypes in the reference panel for which the tag 
    SNP haplotype is H and the untyped SNP allele is 1.  

    The third entry in the subfield corresponding to tag SNP haplotype H 
    is the total count of type H haplotypes in the reference panel.

    The fourth entry in each subfield is equal to (entry 2)/(entry 3), 
    which represents the estimated conditional probability of allele 1 at 
    the untyped SNP given haplotype H at the tag SNPs.

    Special note on phased versus unphased reference panel: 

    As of October 2009, the current implementation of tuna_db requires
    a phased reference panel.  ATRIUM allows an unphased reference panel,
    but we do not currently provide a routine to generate the multilocus 
    LD database input file in that case.  In order to generate such a 
    database input file yourself, you could replace entry 4 of each 
    subfield with an estimated conditional probability, in the reference 
    panel, of allele 1 at the untyped SNP given haplotype H at the tag 
    SNPs, where this estimated conditional probability could be obtained 
    as a ratio of the corresponding haplotype frequency estimates from 
    an EM algorithm approach or from one of the current imputation 
    models/methods.  It is important to note that ATRIUM actually ignores 
    entries 2 and 3 in each subfield of item (11), but reads them in as
    integers, so for an unphased reference panel, entries 2 and 3 could 
    be set to be arbitrary integers.
  
    The default filename is "database".  To specify a different filename,
    use the command-line flag -db followed by the filename.  For 
    example, to use a multilocus LD database file called "myldfile" you 
    could type the command

	./ATRIUM -db myldfile


5.  The parameter file (default filename is "parameter")

    This file contains one number: an estimate of the population prevalence 
    of the binary trait.  This prevalence value is used in the calculation 
    of the ATRIUM statistic.  This should not be prevalence in the 
    case-control sample, but rather the "general population" prevalence for 
    an appropriate reference population.

    The default filename is "parameter".  To specify a different filename,
    use the command-line flag -r followed by the filename.  For example, 
    to use a parameter file called "myprev" you could type the command

	./ATRIUM -r myprev



Optional Input:

6.  M_D threshold value (default value is 0.4)

    This value is the threshold (which must be a number between 0 and 1)
    for the minimum allowable amount of information on an untyped SNP, 
    based on its tag SNPs, where information is measured by M_D (Nicolae 
    2006).  M_D quantifies how much of the information, on a given untyped
    SNP, is captured by its tag SNPs (where 0 is no information and 1 is 
    perfect information).  The default value is .4.  An untyped SNP is
    considered for testing only if its M_D value, based on its tag SNPs, 
    is strictly greater than this threshold.

    The default M_D threshold is .4.  To change the M_D threshold, use 
    the command-line flag -md followed by the threshold value.  For 
    example, to use a more stringent M_D threshold of .75, you could type 
    the command

	./ATRIUM -md .75



7.  r^2 threshold value (default value is 1)

    This value is the threshold (which must be a number between 0 and 1) 
    for the maximum allowable r^2 between an untyped SNP and any of its 
    tag SNPs, where r^2 is the square of the correlation coefficient.  
    The default value is 1.  An untyped SNP is considered for testing only 
    if its maximum r^2 with any tag SNP is strictly less than this 
    threshold.

    The default r^2 threshold is 1.  To change the r^2 threshold, use the 
    command-line flag -r2 followed by the threshold value.  For example, 
    to use a more stringent threshold of .9, you could type the command

        ./ATRIUM -r2 .9