TY - JOUR
T1 - Kernelized Discriminant Analysis for Joint Modeling of Multivariate Categorical Responses
AU - Jin, Yisen
AU - Zhang, Xin
AU - Molstad, Aaron J.
N1 - Publisher Copyright:
© 2025 American Statistical Association and Institute of Mathematical Statistics.
PY - 2025
Y1 - 2025
N2 - Modeling the joint probability mass of multiple categorical variables as a function of predictors is a fundamental task in categorical data analysis. When the number of response variables, number of categories per response, and/or the number of predictors is large, existing likelihood-based methods cannot be applied or perform poorly. In this article, we propose a novel approach which assumes a variation of the normal linear discriminant analysis model. In order to estimate unknown parameters in way that exploits dependence amongst the response variables, we propose a new penalized likelihood method based on discrete kernel regression. We propose two estimators, each of which can lead to interpretable and parsimonious fitted models. Theoretically, we establish statistical properties of our method and demonstrate a tradeoff between the statistical error and approximation error. Through simulation studies and an application to genomic data, we demonstrate that our method yields better classification accuracy and more interpretable fitted models than existing methods. Software implementing our method, as well as code for reproducing the results in this article, are available for download at https://github.com/yjin07/kernelizedDA. Supplementary materials for this article are available online.
AB - Modeling the joint probability mass of multiple categorical variables as a function of predictors is a fundamental task in categorical data analysis. When the number of response variables, number of categories per response, and/or the number of predictors is large, existing likelihood-based methods cannot be applied or perform poorly. In this article, we propose a novel approach which assumes a variation of the normal linear discriminant analysis model. In order to estimate unknown parameters in way that exploits dependence amongst the response variables, we propose a new penalized likelihood method based on discrete kernel regression. We propose two estimators, each of which can lead to interpretable and parsimonious fitted models. Theoretically, we establish statistical properties of our method and demonstrate a tradeoff between the statistical error and approximation error. Through simulation studies and an application to genomic data, we demonstrate that our method yields better classification accuracy and more interpretable fitted models than existing methods. Software implementing our method, as well as code for reproducing the results in this article, are available for download at https://github.com/yjin07/kernelizedDA. Supplementary materials for this article are available online.
KW - Categorical data analysis
KW - Kernel methods
KW - Linear discriminant analysis
UR - https://www.scopus.com/pages/publications/105014156548
UR - https://www.scopus.com/pages/publications/105014156548#tab=citedBy
U2 - 10.1080/10618600.2025.2526412
DO - 10.1080/10618600.2025.2526412
M3 - Article
AN - SCOPUS:105014156548
SN - 1061-8600
JO - Journal of Computational and Graphical Statistics
JF - Journal of Computational and Graphical Statistics
ER -