TY - JOUR

T1 - Detecting linkage disequilibrium in bacterial populations

AU - Haubold, Bernhard

AU - Travisano, Michael

AU - Rainey, Paul B.

AU - Hudson, Richard R.

PY - 1998/12/1

Y1 - 1998/12/1

N2 - The distribution of the number of pairwise differences calculated from comparisons between n haploid genomes has frequently been used as a starting point for testing the hypothesis of linkage equilibrium. For this purpose the variance of the pairwise differences, V(D), is used as a test statistic to evaluate the null hypothesis that all loci are in linkage equilibrium. The problem is to determine the critical value of the distribution of V(D). This critical value can be estimated either by Monte Carlo simulation or by assuming that V(D) is distributed normally and calculating a one-tailed 95% critical value for V(D), L, L = E(V(D)) + 1.645 √Var(V(D)), where E(V(D)) is the expectation of V(D), and Var(V(D)) is the variance of V(D). If V(D) (observed) > L, the null hypothesis of linkage equilibrium is rejected. Using Monte Carlo simulation we show that the formula currently available for Var(V(D)) is incorrect, especially for genetically highly diverse data. This has implications for hypothesis testing in bacterial populations, which are often genetically highly diverse. For this reason we derive a new, exact formula for Var(V(D)). The distribution of V(D) is examined and shown to approach normality as the sample size increases. This makes the new formula a useful tool in the investigation of large data sets, where testing for linkage using Monte Carlo simulation can be very time consuming. Application of the new formula, in conjunction with Monte Carlo simulation, to populations of Bradyrhizobium japonicum, Rhizobium leguminosarum, and Bacillus subtilis reveals linkage disequilibrium where linkage equilibrium has previously been reported.

AB - The distribution of the number of pairwise differences calculated from comparisons between n haploid genomes has frequently been used as a starting point for testing the hypothesis of linkage equilibrium. For this purpose the variance of the pairwise differences, V(D), is used as a test statistic to evaluate the null hypothesis that all loci are in linkage equilibrium. The problem is to determine the critical value of the distribution of V(D). This critical value can be estimated either by Monte Carlo simulation or by assuming that V(D) is distributed normally and calculating a one-tailed 95% critical value for V(D), L, L = E(V(D)) + 1.645 √Var(V(D)), where E(V(D)) is the expectation of V(D), and Var(V(D)) is the variance of V(D). If V(D) (observed) > L, the null hypothesis of linkage equilibrium is rejected. Using Monte Carlo simulation we show that the formula currently available for Var(V(D)) is incorrect, especially for genetically highly diverse data. This has implications for hypothesis testing in bacterial populations, which are often genetically highly diverse. For this reason we derive a new, exact formula for Var(V(D)). The distribution of V(D) is examined and shown to approach normality as the sample size increases. This makes the new formula a useful tool in the investigation of large data sets, where testing for linkage using Monte Carlo simulation can be very time consuming. Application of the new formula, in conjunction with Monte Carlo simulation, to populations of Bradyrhizobium japonicum, Rhizobium leguminosarum, and Bacillus subtilis reveals linkage disequilibrium where linkage equilibrium has previously been reported.

UR - http://www.scopus.com/inward/record.url?scp=0031788831&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0031788831&partnerID=8YFLogxK

M3 - Article

C2 - 9832514

AN - SCOPUS:0031788831

SN - 0016-6731

VL - 150

SP - 1341

EP - 1348

JO - Genetics

JF - Genetics

IS - 4

ER -