Abstract
Political event data are widely used in studies of political violence. Recent years have seen notable advances in the automated coding of political event data from international news sources. Yet, the validity of machine-coded event data remains disputed, especially in the context of event geolocation. We analyze the frequencies of human- and machine-geocoded event data agreement in relation to an independent (ground truth) source. The events are human rights violations in Colombia. We perform our evaluation for a key, 8-year period of the Colombian conflict and in three 2-year subperiods as well as for a selected set of (non)journalistically remote municipalities. As a complement to this analysis, we estimate spatial probit models based on the three datasets. These models assume Gaussian Markov Random Field error processes; they are constructed using a stochastic partial differential equation and estimated with integrated nested Laplacian approximation. The estimated models tell us whether the three datasets produce comparable predictions, underreport events in relation to the same covariates, and have similar patterns of prediction error. Together the two analyses show that, for this subnational conflict, the machine- and human-geocoded datasets are comparable in terms of external validity but, according to the geostatistical models, produce prediction errors that differ in important respects.
Original language | English (US) |
---|---|
Pages (from-to) | 81-97 |
Number of pages | 17 |
Journal | Political Analysis |
Volume | 31 |
Issue number | 1 |
DOIs | |
State | Published - Jan 15 2023 |
Bibliographical note
Funding Information:This work was supported by the National Science Foundation [SBE-SMA-1539302].
Publisher Copyright:
© The Author(s) 2021. Published by Cambridge University Press on behalf of the Society for Political Methodology.
Keywords
- Event data
- external validity
- geocoding
- human rights violations
- machine coding
- spatial analysis
- spatial regression