Abstract
We assume i.i.d. data sampled from a mixture distribution with K components along fixed d-dimensional linear subspaces and an additional outlier component. For p > 0, we study the simultaneous recovery of the K fixed subspaces by minimizing the lp-averaged distances of the sampled data points from any K subspaces. Under some conditions, we show that if 0<p ≤ 1, then all underlying subspaces can be precisely recovered by lp minimization with overwhelming probability. On the other hand, if K >1 and p >1, then the underlying subspaces cannot be recovered or even nearly recovered by lp minimization. The results of this paper partially explain the successes and failures of the basic approach of lp energy minimization for modeling data by multiple subspaces.
Original language | English (US) |
---|---|
Pages (from-to) | 2686-2715 |
Number of pages | 30 |
Journal | Annals of Statistics |
Volume | 39 |
Issue number | 5 |
DOIs | |
State | Published - Oct 2011 |
Keywords
- Clustering
- Detection
- Geometric probability
- High-dimensional data
- Hybrid linear modeling
- Multiple subspaces
- Optimization on the grassmannian
- Robustness