TY - JOUR
T1 - Detecting linkage disequilibrium in bacterial populations
AU - Haubold, Bernhard
AU - Travisano, Michael
AU - Rainey, Paul B.
AU - Hudson, Richard R.
PY - 1998/12/1
Y1 - 1998/12/1
N2 - The distribution of the number of pairwise differences calculated from comparisons between n haploid genomes has frequently been used as a starting point for testing the hypothesis of linkage equilibrium. For this purpose the variance of the pairwise differences, V(D), is used as a test statistic to evaluate the null hypothesis that all loci are in linkage equilibrium. The problem is to determine the critical value of the distribution of V(D). This critical value can be estimated either by Monte Carlo simulation or by assuming that V(D) is distributed normally and calculating a one-tailed 95% critical value for V(D), L, L = E(V(D)) + 1.645 √Var(V(D)), where E(V(D)) is the expectation of V(D), and Var(V(D)) is the variance of V(D). If V(D) (observed) > L, the null hypothesis of linkage equilibrium is rejected. Using Monte Carlo simulation we show that the formula currently available for Var(V(D)) is incorrect, especially for genetically highly diverse data. This has implications for hypothesis testing in bacterial populations, which are often genetically highly diverse. For this reason we derive a new, exact formula for Var(V(D)). The distribution of V(D) is examined and shown to approach normality as the sample size increases. This makes the new formula a useful tool in the investigation of large data sets, where testing for linkage using Monte Carlo simulation can be very time consuming. Application of the new formula, in conjunction with Monte Carlo simulation, to populations of Bradyrhizobium japonicum, Rhizobium leguminosarum, and Bacillus subtilis reveals linkage disequilibrium where linkage equilibrium has previously been reported.
AB - The distribution of the number of pairwise differences calculated from comparisons between n haploid genomes has frequently been used as a starting point for testing the hypothesis of linkage equilibrium. For this purpose the variance of the pairwise differences, V(D), is used as a test statistic to evaluate the null hypothesis that all loci are in linkage equilibrium. The problem is to determine the critical value of the distribution of V(D). This critical value can be estimated either by Monte Carlo simulation or by assuming that V(D) is distributed normally and calculating a one-tailed 95% critical value for V(D), L, L = E(V(D)) + 1.645 √Var(V(D)), where E(V(D)) is the expectation of V(D), and Var(V(D)) is the variance of V(D). If V(D) (observed) > L, the null hypothesis of linkage equilibrium is rejected. Using Monte Carlo simulation we show that the formula currently available for Var(V(D)) is incorrect, especially for genetically highly diverse data. This has implications for hypothesis testing in bacterial populations, which are often genetically highly diverse. For this reason we derive a new, exact formula for Var(V(D)). The distribution of V(D) is examined and shown to approach normality as the sample size increases. This makes the new formula a useful tool in the investigation of large data sets, where testing for linkage using Monte Carlo simulation can be very time consuming. Application of the new formula, in conjunction with Monte Carlo simulation, to populations of Bradyrhizobium japonicum, Rhizobium leguminosarum, and Bacillus subtilis reveals linkage disequilibrium where linkage equilibrium has previously been reported.
UR - http://www.scopus.com/inward/record.url?scp=0031788831&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0031788831&partnerID=8YFLogxK
M3 - Article
C2 - 9832514
AN - SCOPUS:0031788831
SN - 0016-6731
VL - 150
SP - 1341
EP - 1348
JO - Genetics
JF - Genetics
IS - 4
ER -