The Prevalence and Severity of Underreporting Bias in Machine- and Human-Coded Data

Benjamin E. Bagozzi, Patrick T. Brandt, John R. Freeman, Jennifer S. Holmes, Alisha Kim, Agustin Palao Mendizabal, Carly Potz-Nielsen

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

Textual data are plagued by underreporting bias. For example, news sources often fail to report human rights violations. Cook et al. propose a multi-source estimator to gauge, and to account for, the underreporting of state repression events within human codings of news texts produced by the Agence France-Presse and Associated Press. We evaluate this estimator with Monte Carlo experiments, and then use it to compare the prevalence and seriousness of underreporting when comparable texts are machine coded and recorded in the World-Integrated Crisis Early Warning System dataset. We replicate Cook et al.'s investigation of human-coded state repression events with our machine-coded events, and validate both models against an external measure of human rights protections in Africa. We then use the Cook et al. estimator to gauge the seriousness and prevalence of underreporting in machine and human-coded event data on human rights violations in Colombia. We find in both applications that machine-coded data are as valid as human-coded data.

Original languageEnglish (US)
Pages (from-to)641-649
Number of pages9
JournalPolitical Science Research and Methods
Volume7
Issue number3
DOIs
StatePublished - Jul 1 2019

Bibliographical note

Funding Information:
* Benjamin E. Bagozzi, Department of Political Science & International Relations, University of Delaware, 405 Smith Hall, 18 Amstel Ave, Newark, DE 19716 ( bagozzib@udel.edu ). Patrick T. Brandt ( pbrandt@utdallas.edu ), Jennifer S. Holmes ( jholmes@utdallas.edu ), Alisha Kim ( Alisha.Kim@utdallas.edu ) and Agustin Palao Mendizabal ( Agustin.PalaoMendizabal@utdallas.edu ), School of Economic, Political and Policy Sciences, University of Texas, Dallas, 800 W. Campbell Rd, GR31 Richardson TX 75080. John R. Freeman ( freeman@umn.edu ) and Carly Potz-Nielsen ( potzn001@umn.edu ), Department of Political Science, University of Minnesota, 1414 Social Sciences, 267 19th Ave S., Minneapolis, MN 55455. An earlier version of this paper was presented as a poster at the 34th Annual Meeting of the Political Methodology Society. This research is supported by NSF Grant Number SBE-SMA-1539302. The authors thank Associate Editor Daniel Stegmueller, two anonymous reviewers, as well as Scott Cook, Mark Nieman, and Vito D’Orazio for their helpful comments and suggestions. To view supplementary material for this article, please visit https://doi.org/10.1017/psrm.2018.11 Bagozzi Benjamin E. Brandt Patrick T. Freeman John R. Holmes Jennifer S. Kim Alisha Palao Mendizabal Agustin Potz-Nielsen Carly 05 03 2018 07 2019 7 3 641 649 © The European Political Science Association 2018  2018 The European Political Science Association Textual data are plagued by underreporting bias. For example, news sources often fail to report human rights violations. Cook et al. propose a multi-source estimator to gauge, and to account for, the underreporting of state repression events within human codings of news texts produced by the Agence France-Presse and Associated Press. We evaluate this estimator with Monte Carlo experiments, and then use it to compare the prevalence and seriousness of underreporting when comparable texts are machine coded and recorded in the World-Integrated Crisis Early Warning System dataset. We replicate Cook et al.’s investigation of human-coded state repression events with our machine-coded events, and validate both models against an external measure of human rights protections in Africa. We then use the Cook et al. estimator to gauge the seriousness and prevalence of underreporting in machine and human-coded event data on human rights violations in Colombia. We find in both applications that machine-coded data are as valid as human-coded data. pdf S2049847018000110a.pdf

Publisher Copyright:
© 2018 The European Political Science Association.

Fingerprint Dive into the research topics of 'The Prevalence and Severity of Underreporting Bias in Machine- and Human-Coded Data'. Together they form a unique fingerprint.

Cite this