Abstract
The aqueous solubility (log S) of xenobiotic chemicals has been identified as a key characteristic in determining their bioaccessibility/bioavailability and their fate and transport in aquatic environments. We here explore and evaluate the use of a state-of-the-art data analysis technique (Project to Latent Structures, PLS) to estimate log S of environmentally relevant chemicals. A large number (n=624) of molecular descriptors was computed for over 1400 organic chemicals, and then refined by a feature selection technique. Candidate predictor descriptors were fitted to data by means of PLS, which was optimized by an internal leave-one-out cross-validation technique and validated by an external data set. The final (best) PLS model with only four variables (AlogP, X1. sol, Mv, and E) exhibited noteworthy stability and good predictive power. It was able to explain 91% of the data (n=1400) variance with an average absolute error of 0.5 log units through the solubilities span over 12 orders of magnitude. The newly proposed model is transparent, easily portable from one user to another, and robust enough to accurately estimate log S of a wide range of emerging contaminants.
Original language | English (US) |
---|---|
Pages (from-to) | 5362-5370 |
Number of pages | 9 |
Journal | Water Research |
Volume | 47 |
Issue number | 14 |
DOIs | |
State | Published - Sep 5 2013 |
Bibliographical note
Funding Information:This work was supported by the Postdoctoral Fellowship Program of the St. Anthony Fall Laboratory. The authors thank anonymous reviewers for their constructive comments.
Keywords
- Aqueous solubility
- Environmental contaminants
- Environmental mobility
- Partial least-squares regression
- QSPRs
- Water quality