Kernelized Discriminant Analysis for Joint Modeling of Multivariate Categorical Responses

Research output: Contribution to journalArticlepeer-review

Abstract

Modeling the joint probability mass of multiple categorical variables as a function of predictors is a fundamental task in categorical data analysis. When the number of response variables, number of categories per response, and/or the number of predictors is large, existing likelihood-based methods cannot be applied or perform poorly. In this article, we propose a novel approach which assumes a variation of the normal linear discriminant analysis model. In order to estimate unknown parameters in way that exploits dependence amongst the response variables, we propose a new penalized likelihood method based on discrete kernel regression. We propose two estimators, each of which can lead to interpretable and parsimonious fitted models. Theoretically, we establish statistical properties of our method and demonstrate a tradeoff between the statistical error and approximation error. Through simulation studies and an application to genomic data, we demonstrate that our method yields better classification accuracy and more interpretable fitted models than existing methods. Software implementing our method, as well as code for reproducing the results in this article, are available for download at https://github.com/yjin07/kernelizedDA. Supplementary materials for this article are available online.

Original languageEnglish (US)
JournalJournal of Computational and Graphical Statistics
DOIs
StateAccepted/In press - 2025

Bibliographical note

Publisher Copyright:
© 2025 American Statistical Association and Institute of Mathematical Statistics.

Keywords

  • Categorical data analysis
  • Kernel methods
  • Linear discriminant analysis

Fingerprint

Dive into the research topics of 'Kernelized Discriminant Analysis for Joint Modeling of Multivariate Categorical Responses'. Together they form a unique fingerprint.

Cite this