Abstract
In the current era of big data, researchers routinely collect and analyze data of super-large sample sizes. Data-oriented statistical methods have been developed to extract information from super-large data. Smoothing spline ANOVA (SSANOVA) is a promising approach for extracting information from noisy data; however, the heavy computational cost of SSANOVA hinders its wide application. In this paper, we propose a new algorithm for fitting SSANOVA models to super-large sample data. In this algorithm, we introduce rounding parameters to make the computation scalable. To demonstrate the benefits of the rounding parameters, we present a simulation study and a real data example using electroencephalography data. Our results reveal that (using the rounding parameters) a researcher can fit nonparametric regression models to very large samples within a few seconds using a standard laptop or tablet computer.
Original language | English (US) |
---|---|
Pages (from-to) | 433-444 |
Number of pages | 12 |
Journal | Statistics and its Interface |
Volume | 9 |
Issue number | 4 |
DOIs | |
State | Published - 2016 |
Bibliographical note
Funding Information:This research was partially supported by NSF grants DMS 1440037 and DMS 1438957, and start-up funds from the University of Minnesota.
Keywords
- Rounding parameter
- Scalable algorithm
- Smoothing spline ANOVA