Abstract
Bootstrapping is a simple technique typically used to assess accuracy of estimates of model parameters by using simple plug-in principles and replacing sometimes unwieldy theory by computer simulation. Common uses include variance estimation and confidence interval construction of model parameters. It also provides a way to estimate prediction accuracy of continuous and class-valued outcomes regression models. In this paper we will overview some of these applications of the bootstrap focusing on bootstrap estimates of prediction error, and also explore how the bootstrap can be used to improve prediction accuracy of unstable models like tree-structured classifiers through aggregation. The improvements can typically be attributed to variance reduction in the classical regression setting and more generally a smoothing of decision boundaries for the classification setting. These advancements have important implications in the way that atmospheric prediction models can be improved, and illustrations of this will be shown. For class-valued outcomes, an interesting graphic known as the CAT scan can be constructed to help understand the aggregated decision boundary. This will be illustrated using simulated data.
Original language | English (US) |
---|---|
Pages (from-to) | 29-41 |
Number of pages | 13 |
Journal | Data Mining and Knowledge Discovery |
Volume | 4 |
Issue number | 1 |
State | Published - 2000 |
Externally published | Yes |
Keywords
- Bootstrap
- CART
- Classification
- Hurricanes
- Instability
- Supervised learning
- Weather data