The HaploPro statistical tool
by Jason A. Holt and Sierra D. Stoneberg Holt


A statistical tool for evaluating samples in intraspecific studies
HaploPro [Microsoft Excel 2000 spreadsheet, 59 kB]

Holt, Jason A., Sierra D. Stoneberg Holt & Petr Bures (2007):Experimental design in intraspecific organelle DNA sequence studies III: statistical measures of sampling success.  Taxon 56(3). [fulltext in pdf]

HaploPro is a simple Microsoft® Excel 2000 spreadsheet of 59 kb that calculates the Inclusion-Exclusion function and Goodman (1965, Technometrics, 7: 247-254) confidence intervals.

The first worksheet of HaploPro, "Found Haplotypes", calculates the Inclusion-Exclusion function. The user enters the minimum proportion a haplotype must have to be considered prevalent. (This minimum proportion must be at least 1%.) Because the Inclusion-Exclusion function is designed for proportions of the form 1/k, the spreadsheet calculates the smallest integer k such that 1/k is less than or equal to the specified proportion. (If the specified proportion cannot be expressed as 1/k or if it is less than 1%, a message appears indicating that the calculation actually applies to a proportion different from that which was entered.) When sample size is entered, the probability that all haplotypes of proportion 1/k and greater are represented in the sample (and inversely the probability that haplotypes of that proportion have been missed) is displayed. Only proportion and sample size can be entered, but specific significance levels can be quickly found by varying sample size.

Confidence interval calculations are performed on the second sheet of HaploPro, "Confidence Intervals". Here the user enters the number of haplotypes found, the number of individuals in the sample, the number of individuals with the haplotype of interest, and the desired confidence level. The spreadsheet calculates the proportion of the sample with the given haplotype and upper and lower bounds for the proportion of the studied group with the given haplotype. Confidence intervals must be determined for each haplotype individually. To use this worksheet to calculate a score confidence interval for a single proportion, set the number of haplotypes to 1. When calculating the sum of the lower bounds statistic, the number in the Lower Bound box should be used.
The calculations are identical to those in the Confidence Interval box except that instead of is used when calculating .

Abstract

Statistical methods are proposed for analyzing the experimental design, preliminary results, and final results of phylogenetic studies of organelle DNA sequence at low taxonomic levels. Such studies require sampling numerous individuals, many of which share identical haplotypes. The proportions of the haplotypes sampled can help answer the following questions:

  1. Is one haplotype so dominant that the particular DNA region is without meaningful variation within the scope of the study?
  2. Were all prevalent haplotypes found?
  3. What are the proportions of each haplotype within the studied group?
  4. What percentage of the studied group can be confidently asserted to belong to the haplotypes that were found?

Examples are given in which the statistics techniques are applied to data drawn from the botanical literature. Tables are included as a quick reference for the researcher who wishes to circumvent calculation. A Microsoft® Excel 2000 spreadsheet (titled "HaploPro.xls") for performing some of the more complicated calculations is offered online. Finally, the limitations of these methods and their applicability to nuclear DNA and other characters are discussed.


[Plant Biodiversity & Biosystematics] [Department of Botany and Zoology]