The replicability crisis refers to the apparent failures to replicate both important and typical positive experimental claims in psychological science and biomedicine, failures which have gained increasing attention in the past decade. In order to provide evidence that there is a replicability crisis in the first place, scientists have developed various measures of replication that help quantify or “count” whether one study replicates another. In this nontechnical essay, I critically examine five types of replication measures used in the landmark article “Estimating the reproducibility of psychological science” (Open Science Collaboration, Science, 349, ac4716, 2015) based on the following techniques: subjective assessment, null hypothesis significance testing, comparing effect sizes, comparing the original effect size with the replication confidence interval, and meta-analysis. The first four, I argue, remain unsatisfactory for a variety of conceptual or formal reasons, even taking into account various improvements. By contrast, at least one version of the meta-analytic measure does not suffer from these problems. It differs from the others in rejecting dichotomous conclusions, the assumption that one study replicates another or not simpliciter. I defend it from other recent criticisms, concluding however that it is not a panacea for all the multifarious problems that the crisis has highlighted.
Bibliographical noteFunding Information:
This essay was written in part with the support of a Visiting Fellowship at the University of Pittsburgh’s Center for Philosophy of Science and a Single Semester Leave from the University of Minnesota, Twin Cities.
© 2021, Springer Nature B.V.
- Confidence interval
- Effect size
- Null hypothesis significance testing
- Replicability crisis
- Reproducability crisis