Beyond linear regression: A reference for analyzing common data types in discipline based education research

Elli J. Theobald, Melissa Aikens, Sarah Eddy, Hannah Jordt

Research output: Contribution to journalArticlepeer-review

28 Scopus citations

Abstract

[This paper is part of the Focused Collection on Quantitative Methods in PER: A Critical Examination.] A common goal in discipline-based education research (DBER) is to determine how to improve student outcomes. Linear regression is a common technique used to test hypotheses about the effects of interventions on continuous outcomes (such as exam score) as well as control for student nonequivalence in quasirandom experimental designs. (In quasirandom designs, subjects are not randomly assigned to treatments. For example, when treatment is assigned by classroom, and observations are made on students, the design is quasirandom because treatment is assigned to classroom, not subject (students).) However, many types of outcome data cannot be appropriately analyzed with linear regression. In these instances, researchers must move beyond linear regression and implement alternative regression techniques. For example, student outcomes can be measured on binary scales (e.g., pass or fail), tightly bound scales (e.g., strongly agree to strongly disagree), or nominal scales (i.e., different discrete choices for example multiple tracks within a physics major), each necessitating alternative regression techniques. Here, we review extensions of linear modeling - generalized linear models (glms) - and specifically compare five glms that are useful for analyzing DBER data: logistic, binomial, proportional odds (also called ordinal; including censored regression), multinomial, and Poisson (including negative binomial, hurdle, and zero-inflated) regression. We introduce a diagnostic tool to facilitate a researcher's identification of the most appropriate glm for their own data. For each model type, we explain when, why, and how to implement the regression approach. When: we provide examples of the types of research questions and outcome data that would motivate this regression approach, including citations to articles in the DBER literature. Why: we name which linear regression assumption is violated by the data type. How: we detail implementation and interpretation of this modeling approach in R, including R syntax and code, and how to discuss the regression output in research papers. Code accompanying each analysis can be found in the online github repository that is associated with this paper (https://github.com/ejtheobald/BeyondLinearRegression). This paper is not an exhaustive review of regression techniques, nor does it review nonregression-based analyses. Rather, it aims to compile and summarize regression techniques useful for the most common types of DBER data and provide examples, citations, and heavily annotated R code so that researchers can easily implement the technique in their work.

Original languageEnglish (US)
Article number020110
JournalPhysical Review Physics Education Research
Volume15
Issue number2
DOIs
StatePublished - Jul 3 2019
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2019 authors. Published by the American Physical Society.

Fingerprint

Dive into the research topics of 'Beyond linear regression: A reference for analyzing common data types in discipline based education research'. Together they form a unique fingerprint.

Cite this