Abstract
This study examines the potential to use non-expert, crowd-sourced raters to score essays by comparing expert raters’ and crowd-sourced raters’ assessments of writing quality. Expert raters and crowd-sourced raters scored 400 essays using a standardised holistic rubric and comparative judgement (pairwise ratings) scoring techniques, respectively. The findings indicated that 92% of non-expert, pairwise ratings were sufficiently reliable and raters’ alignment with overall rankings was 67.9%. Additionally, the non-expert ratings were moderately correlated (r =.397) with expert ratings. Further, the linguistic features of the essays were computed to predict expert and non-expert pairwise ratings, revealing that the predictive models of essay quality for both expert and non-expert scores accounted for around 30–35% of the variance. The two models also shared similar linguistic features. The results collectively demonstrate similarities between non-expert pairwise raters and expert raters when assessing essay quality.
| Original language | English (US) |
|---|---|
| Pages (from-to) | 33-59 |
| Number of pages | 27 |
| Journal | Assessment in Education: Principles, Policy and Practice |
| Volume | 32 |
| Issue number | 1 |
| DOIs | |
| State | Published - 2025 |
Bibliographical note
Publisher Copyright:© 2025 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.
Keywords
- Crowdsourcing
- corpus linguistics
- natural language processing
- pairwise comparisons
- writing assessment
Fingerprint
Dive into the research topics of 'Assessing writing quality using crowdsourced non-expert comparative judgement ratings'. Together they form a unique fingerprint.Cite this
- APA
- Standard
- Harvard
- Vancouver
- Author
- BIBTEX
- RIS