Abstract
In this paper, we extracted content-based and structure-based features of text to predict human annotations for claims and non-claims in argumentative essays. We compared Logistic Regression, Bernoulli Naive Bayes, Gaussian Naive Bayes, Linear Support Vector Classification, Random Forest, and Neural Networks to train classification models. Random Forest and Neural Network classifiers yielded the most balanced identifications of claims and non-claims based on the evaluation of accuracy, precision, and recall. The Random Forest model was then used to calculate the number, percentage, and positionality of claims and non-claims in a validation corpus that included human ratings of writing quality. Correlational and regression analyses indicated that the number of claims and the average position of non-claims in text were significant indicators of essay quality in the expected direction.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the 13th International Conference on Educational Data Mining, EDM 2020 |
Editors | Anna N. Rafferty, Jacob Whitehill, Cristobal Romero, Violetta Cavalli-Sforza |
Publisher | International Educational Data Mining Society |
Pages | 691-695 |
Number of pages | 5 |
ISBN (Electronic) | 9781733673617 |
State | Published - 2020 |
Externally published | Yes |
Event | 13th International Conference on Educational Data Mining, EDM 2020 - Virtual, Online Duration: Jul 10 2020 → Jul 13 2020 |
Publication series
Name | Proceedings of the 13th International Conference on Educational Data Mining, EDM 2020 |
---|
Conference
Conference | 13th International Conference on Educational Data Mining, EDM 2020 |
---|---|
City | Virtual, Online |
Period | 7/10/20 → 7/13/20 |
Bibliographical note
Publisher Copyright:© 2020 Proceedings of the 13th International Conference on Educational Data Mining, EDM 2020. All rights reserved.
Keywords
- argument mining
- automated essay evaluation
- claim detection
- essay quality
- natural language processing