In 2004, Garcia-Berthou and Alcaraz published "Incongruence between test statistics and P values in medical papers," a critique of statistical errors that received a tremendous amount of attention. One of their observations was that the final reported digit of p-values in articles published in the journal Nature departed substantially from the uniform distribution that they suggested should be expected. In 2006, Jeng critiqued that critique, observing that the statistical analysis of those terminal digits had been based on comparing the actual distribution to a uniform continuous distribution, when digits obviously are discretely distributed. Jeng corrected the calculation and reported statistics that did not so clearly support the claim of a digit preference. However delightful it may be to read a critique of statistical errors in a critique of statistical errors, we nevertheless found several aspects of the whole exchange to be quite troubling, prompting our own meta-critique of the analysis. The previous discussion emphasized statistical significance testing. But there are various reasons to expect departure from the uniform distribution in terminal digits of p-values, so that simply rejecting the null hypothesis is not terribly informative. Much more importantly, Jeng found that the original p-value of 0.043 should have been 0.086, and suggested this represented an important difference because it was on the other side of 0.05. Among the most widely reiterated (though often ignored) tenets of modern quantitative research methods is that we should not treat statistical significance as a bright line test of whether we have observed a phenomenon. Moreover, it sends the wrong message about the role of statistics to suggest that a result should be dismissed because of limited statistical precision when it is so easy to gather more data. In response to these limitations, we gathered more data to improve the statistical precision, and analyzed the actual pattern of the departure from uniformity, not just its test statistics. We found variation in digit frequencies in the additional data and describe the distinctive pattern of these results. Furthermore, we found that the combined data diverge unambiguously from a uniform distribution. The explanation for this divergence seems unlikely to be that suggested by the previous authors: errors in calculations and transcription.
Bibliographical noteFunding Information:
The authors thank Erin Pollock for research assistance. This research was supported, in part, by the Intramural Research Program of the NIH, and NIEHS (MacLehose) and by an unrestricted grant from the U.S. Smokeless Tobacco Company to the University of Alberta for the support of the research of Dr. Phillips and colleagues (Pollock).