In just a few years, crowdsourcing markets like Mechanical Turk have become the dominant mechanism for for building "gold standard" datasets in areas of computer science ranging from natural language processing to audio transcription. The assumption behind this sea change-An assumption that is central to the approaches taken in hundreds of research projects-is that crowdsourced markets can accurately replicate the judgments of the general population for knowledgeoriented tasks. Focusing on the important domain of semantic relatedness algorithms and leveraging Clark's theory of common ground as a framework, we demonstrate that this assumption can be highly problematic. Using 7,921 semantic relatedness judgements from 72 scholars and 39 crowdworkers, we show that crowdworkers on Mechanical Turk produce significantly different semantic relatedness gold standard judgements than people from other communities. We also show that algorithms that perform well against Mechanical Turk gold standard datasets do significantly worse when evaluated against other communities' gold standards. Our results call into question the broad use of Mechanical Turk for the development of gold standard datasets and demonstrate the importance of understanding these datasets from a human-centered point-of-view. More generally, our findings problematize the notion that a universal gold standard dataset exists for all knowledge tasks.
|Original language||English (US)|
|Title of host publication||CSCW 2015 - Proceedings of the 2015 ACM International Conference on Computer-Supported Cooperative Work and Social Computing|
|Publisher||Association for Computing Machinery, Inc|
|Number of pages||13|
|State||Published - Feb 28 2015|
|Event||18th ACM International Conference on Computer-Supported Cooperative Work and Social Computing, CSCW 2015 - BC, Canada|
Duration: Mar 14 2015 → Mar 18 2015
|Name||CSCW 2015 - Proceedings of the 2015 ACM International Conference on Computer-Supported Cooperative Work and Social Computing|
|Other||18th ACM International Conference on Computer-Supported Cooperative Work and Social Computing, CSCW 2015|
|Period||3/14/15 → 3/18/15|
Bibliographical noteFunding Information:
This research has been generously supported by Macalester College and the National Science Foundation (grants IIS-0964697 and IIS-0808692).
© 2015 ACM.
- Amazon Mechanical Turk
- cultural communities
- gold standard datasets
- natural language processing
- semantic relatedness
- user studies