The distribution of the number of pairwise differences calculated from comparisons between n haploid genomes has frequently been used as a starting point for testing the hypothesis of linkage equilibrium. For this purpose the variance of the pairwise differences, V(D), is used as a test statistic to evaluate the null hypothesis that all loci are in linkage equilibrium. The problem is to determine the critical value of the distribution of V(D). This critical value can be estimated either by Monte Carlo simulation or by assuming that V(D) is distributed normally and calculating a one-tailed 95% critical value for V(D), L, L = E(V(D)) + 1.645 √Var(V(D)), where E(V(D)) is the expectation of V(D), and Var(V(D)) is the variance of V(D). If V(D) (observed) > L, the null hypothesis of linkage equilibrium is rejected. Using Monte Carlo simulation we show that the formula currently available for Var(V(D)) is incorrect, especially for genetically highly diverse data. This has implications for hypothesis testing in bacterial populations, which are often genetically highly diverse. For this reason we derive a new, exact formula for Var(V(D)). The distribution of V(D) is examined and shown to approach normality as the sample size increases. This makes the new formula a useful tool in the investigation of large data sets, where testing for linkage using Monte Carlo simulation can be very time consuming. Application of the new formula, in conjunction with Monte Carlo simulation, to populations of Bradyrhizobium japonicum, Rhizobium leguminosarum, and Bacillus subtilis reveals linkage disequilibrium where linkage equilibrium has previously been reported.
|Original language||English (US)|
|Number of pages||8|
|State||Published - Dec 1 1998|