Abstract
We consider the linearly transformed spiked model, where the observations Yi are noisy linear transforms of unobserved signals of interest Xi: Yi = AiXi + εi, for i = 1, . . ., n. The transform matrices Ai are also observed. We model the unobserved signals (or regression coefficients) Xi as vectors lying on an unknown low-dimensional space. Given only Yi and Ai how should we predict or recover their values? The naive approach of performing regression for each observation separately is inaccurate due to the large noise level. Instead, we develop optimal methods for predicting Xi by “borrowing strength” across the different samples. Our linear empirical Bayes methods scale to large datasets and rely on weak moment assumptions. We show that this model has wide-ranging applications in signal processing, deconvolution, cryo-electron microscopy, and missing data with noise. For missing data, we show in simulations that our methods are more robust to noise and to unequal sampling than well-known matrix completion methods.
Original language | English (US) |
---|---|
Pages (from-to) | 491-513 |
Number of pages | 23 |
Journal | Annals of Statistics |
Volume | 48 |
Issue number | 1 |
DOIs | |
State | Published - 2020 |
Bibliographical note
Funding Information:The third author was supported in part by Award Number R01GM090200 from the NIGMS, FA9550-17-1-0291 from AFOSR, Simons Foundation Investigator Award and Si-mons Collaboration on Algorithms and Geometry, and the Moore Foundation Data-Driven Discovery Investigator Award.
Funding Information:
The authors are grateful to the Associate Editor and the referees for detailed comments that have lead to improvements of this work. The authors thank Joakim And?n, Tejal Bhamre, Xiuyuan Cheng, David Donoho and Iain Johnstone for helpful discussions on this work. The authors are grateful to Matan Gavish for valuable suggestions on an earlier version of the manuscript. This work was supported in part by award NSF BIGDATA IIS-1837992. The first author was supported in part by NSF Grant DMS-1407813, and by an HHMI International Student Research Fellowship. The second author was supported by the Simons Collaboration on Algorithms and Geometry. The third author was supported in part by Award Number R01GM090200 from the NIGMS, FA9550-17-1-0291 from AFOSR, Simons Foundation Investigator Award and Simons Collaboration on Algorithms and Geometry, and the Moore Foundation Data-Driven Discovery Investigator Award.
Funding Information:
The first author was supported in part by NSF Grant DMS-1407813, and by an HHMI International Student Research Fellowship.
Funding Information:
Acknowledgments. The authors are grateful to the Associate Editor and the referees for detailed comments that have lead to improvements of this work. The authors thank Joakim Andén, Tejal Bhamre, Xiuyuan Cheng, David Donoho and Iain Johnstone for helpful discussions on this work. The authors are grateful to Matan Gavish for valuable suggestions on an earlier version of the manuscript. This work was supported in part by award NSF BIGDATA IIS-1837992.
Publisher Copyright:
© Institute of Mathematical Statistics, 2020
Keywords
- High dimensional
- Matrix completion
- Missing data
- Principal component analysis
- Random matrix theory
- Shrinkage
- Spiked model