Abstract
Crowdsourcing approaches rely on the collection of multiple individuals to solve problems that require analysis of large data sets in a timely accurate manner. The inexperience of participants or annotators motivates well robust techniques. Focusing on clustering setups, the data provided by all annotators is suitably modeled here as a mixture of Gaussian components plus a uniformly distributed random variable to capture outliers. The proposed algorithm is based on the expectation-maximization algorithm and allows for soft assignments of data to clusters, to rate annotators according to their performance, and to estimate the number of Gaussian components in the non-Gaussian/Gaussian mixture model, in a jointly manner.
Original language | English (US) |
---|---|
Title of host publication | 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - Proceedings |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 4014-4018 |
Number of pages | 5 |
ISBN (Electronic) | 9781509041176 |
DOIs | |
State | Published - Jun 16 2017 |
Event | 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 - New Orleans, United States Duration: Mar 5 2017 → Mar 9 2017 |
Publication series
Name | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
---|---|
ISSN (Print) | 1520-6149 |
Other
Other | 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 |
---|---|
Country/Territory | United States |
City | New Orleans |
Period | 3/5/17 → 3/9/17 |
Bibliographical note
Funding Information:This work has been funded by the Ministerio de Economia y Competitividad of the Spanish Government, ERDF funds (TEC2013-41315-R,TEC2015-69648-REDC, TEC2016-75067-C4-2-R,TEC2013-47020-C2-1-R, TACTICA), the Catalan Government (2014 SGR 60 AGAUR), and the Galician Government (AtlantTIC, GRC2013/009, R2014/037).
Publisher Copyright:
© 2017 IEEE.
Keywords
- Bayesian Information Criterion
- Crowdsourcing
- EM algorithm
- Gaussian plus non-Gaussian Mixture
- Outlier