TY - JOUR
T1 - Human rights texts
T2 - Converting human rights primary source documents into data
AU - Fariss, Christopher J.
AU - Linder, Fridolin J.
AU - Jones, Zachary M.
AU - Crabtree, Charles D.
AU - Biek, Megan A.
AU - Ross, Ana Sophia M.
AU - Kaur, Taranamol
AU - Tsai, Michael
N1 - Publisher Copyright:
© 2015 Fariss et al.
PY - 2015/9/29
Y1 - 2015/9/29
N2 - We introduce and make publicly available a large corpus of digitized primary source human rights documents which are published annually by monitoring agencies that include Amnesty International, Human Rights Watch, the Lawyers Committee for Human Rights, and the United States Department of State. In addition to the digitized text, we also make available and describe document-term matrices, which are datasets that systematically organize the word counts from each unique document by each unique term within the corpus of human rights documents. To contextualize the importance of this corpus, we describe the development of coding procedures in the human rights community and several existing categorical indicators that have been created by human coding of the human rights documents contained in the corpus. We then discuss how the new human rights corpus and the existing human rights datasets can be used with a variety of statistical analyses and machine learning algorithms to help scholars understand how human rights practices and reporting have evolved over time. We close with a discussion of our plans for dataset maintenance, updating, and availability.
AB - We introduce and make publicly available a large corpus of digitized primary source human rights documents which are published annually by monitoring agencies that include Amnesty International, Human Rights Watch, the Lawyers Committee for Human Rights, and the United States Department of State. In addition to the digitized text, we also make available and describe document-term matrices, which are datasets that systematically organize the word counts from each unique document by each unique term within the corpus of human rights documents. To contextualize the importance of this corpus, we describe the development of coding procedures in the human rights community and several existing categorical indicators that have been created by human coding of the human rights documents contained in the corpus. We then discuss how the new human rights corpus and the existing human rights datasets can be used with a variety of statistical analyses and machine learning algorithms to help scholars understand how human rights practices and reporting have evolved over time. We close with a discussion of our plans for dataset maintenance, updating, and availability.
UR - https://www.scopus.com/pages/publications/84947968472
UR - https://www.scopus.com/pages/publications/84947968472#tab=citedBy
U2 - 10.1371/journal.pone.0138935
DO - 10.1371/journal.pone.0138935
M3 - Article
C2 - 26418817
AN - SCOPUS:84947968472
SN - 1932-6203
VL - 10
JO - PloS one
JF - PloS one
IS - 9
M1 - e0138935
ER -