Regression with set-valued categorical predictors

Ganghua Wang, Jie Ding, Yuhong Yang

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

We address the regression problem with a new form of data that arises from data privacy applications. Instead of point values, the observed explanatory variables are subsets containing each individual’s original value. In such cases, we cannot apply classical regression analyses, such as the least squares, because the set-valued predictors carry only partial information about the original values. We propose a computationally efficient subset least squares method for performing a regression on such data. We establish upper bounds of the prediction loss and risk in terms of the subset structure, model structure, and data dimension. The error rates are shown to be optimal in some common situations. Furthermore, we develop a model-selection method to identify the most appropriate model for prediction. Experiment results on both simulated and real-world data sets demonstrate the promising performance of the proposed method.

Original languageEnglish (US)
Pages (from-to)2545-2560
Number of pages16
JournalStatistica Sinica
Volume33
Issue number4
DOIs
StatePublished - Oct 2023

Bibliographical note

Publisher Copyright:
© 2023 All rights reserved.

Keywords

  • Model selection
  • Regression
  • Set-valued data

Fingerprint

Dive into the research topics of 'Regression with set-valued categorical predictors'. Together they form a unique fingerprint.

Cite this