Much prior research has found gender bias in peer production systems like Wikipedia and OpenStreetMap. This bias affects both women's participation in these platforms and content about women on these platforms. We investigated the gender content gap in Wikidata, where less than 22% of items that represent people are about women. We asked: what is the source of this bias? Specifically, does it originate from the actions of Wikidata editors or from external factors; that is, does it simply reflect existing real world gender bias? We conducted a quantitative case study that found: (i) the most popular categories of people included in Wikidata represent male-dominant professions, such as American football; (ii) within a selected set of professions where we could obtain gender distribution data, Wikidata is no more biased than the real world: men and women are included at similar percentages, and the quality of items representing men and women also is similar. We provide possible explanations for our findings and implications for addressing the Wikidata content gap.
|Original language||English (US)|
|Title of host publication||Proceedings of the 17th International Symposium on Open Collaboration, OpenSym 2021|
|Publisher||Association for Computing Machinery|
|State||Published - Sep 15 2021|
|Event||17th International Symposium on Open Collaboration, OpenSym 2021 - Virtual, Online, Spain|
Duration: Sep 15 2021 → Sep 17 2021
|Name||ACM International Conference Proceeding Series|
|Conference||17th International Symposium on Open Collaboration, OpenSym 2021|
|Period||9/15/21 → 9/17/21|
Bibliographical noteFunding Information:
We thank the anonymous reviewers for their comments and suggestions that helped us strengthen our paper. This work was supported by the National Science Foundation(NSF) under Award No. IIS-1816348.
© 2021 ACM.
- Structured data