Abstract
This paper addresses selection of the loss function for regression problems with finite data. It is well-known (under standard regression formulation) that for a known noise density there exist an optimal loss function under an asymptotic setting (large number of samples), i.e. squared loss is optimal for Gaussian noise density. However, in real-life applications the noise density is unknown and the number of training samples is finite. For such practical situations, we suggest using Vapnik's ε-insensitive loss function. We use practical method for setting the value of ε as a function of known number of samples and (known or estimated) noise variance [1,2]. We consider commonly used noise densities (such as Gaussian, Uniform and Laplacian noise). Empirical comparisons for several representative linear regression problems indicate that Vapnik's ε-insensitive loss yields more robust performance and improved prediction accuracy, in comparison with squared loss and least-modulus loss, especially for noisy high-dimensional data sets.
Original language | English (US) |
---|---|
Pages (from-to) | 395-400 |
Number of pages | 6 |
Journal | IEEE International Conference on Neural Networks - Conference Proceedings |
Volume | 1 |
State | Published - 2004 |
Event | 2004 IEEE International Joint Conference on Neural Networks - Proceedings - Budapest, Hungary Duration: Jul 25 2004 → Jul 29 2004 |