Abstract
Four regression methods are used to fit QSAR models for a large set of juvenile hormone mimetic compounds, using a diverse set of descriptors. The proper application of cross-validation is summarized and applied to both the model selection and verification steps, with comparison to the use of a holdout sample. Implementation of the evaluation of a model's predictive ability at the correct point in the procedure is emphasized. A recent regression methodology, the elastic net, is shown to produce a reduced set of predictors while retaining predictive ability.
Original language | English (US) |
---|---|
Pages (from-to) | 33-42 |
Number of pages | 10 |
Journal | Chemometrics and Intelligent Laboratory Systems |
Volume | 87 |
Issue number | 1 |
DOIs | |
State | Published - May 15 2007 |
Bibliographical note
Funding Information:Research reported in this paper was supported, in part, by Grant F49620-02-1-0138 from the United States Air Force and Cooperative Agreement Number 572112 from the Agency for Toxic Substances and Disease Registry. This paper represents contribution number 403 from the Center for Water and the Environment of the Natural Resources Research Institute. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Air Force Office of Scientific Research or the U.S. Government.
Keywords
- Elastic net
- Marginal soft threshold
- Model validation
- Modified Gram-Schmidt orthogonalization
- Ridge regression