Abstract
Identifying meaningful features that drive a phenomenon (response) of interest in complex systems of interconnected factors is a challenging problem. Causal discovery methods have been previously applied to estimate bounds on causal strengths of factors on a response or to identify meaningful interactions between factors in complex systems, but these approaches have been used only for inferential purposes. In contrast, we posit that interactions between factors with a potential causal association on a given response could be viable candidates not only for hypothesis generation but also for predictive modeling. In this work, we propose a causality-guided feature selection methodology that identifies factors having a potential cause-effect relationship in complex systems, and selects features by clustering them based on their causal strength with respect to the response. To this end, we estimate statistically significant causal effects on the response of factors taking part in potential causal relationships, while addressing associated technical challenges, such as multicollinearity in the data. We validate the proposed methodology for predicting response in five real-world datasets from the domain of climate science and biology. The selected features show predictive skill and consistent performance across different domains.
Original language | English (US) |
---|---|
Title of host publication | Advanced Data Mining and Applications - 12th International Conference, ADMA 2016, Proceedings |
Editors | Jinyan Li, Xue Li, Shuliang Wang, Jianxin Li, Quan Z. Sheng |
Publisher | Springer |
Pages | 391-405 |
Number of pages | 15 |
ISBN (Print) | 9783319495859 |
DOIs | |
State | Published - 2016 |
Event | 12th International Conference on Advanced Data Mining and Applications, ADMA 2016 - Gold Coast, Australia Duration: Dec 12 2016 → Dec 15 2016 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 10086 LNAI |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Other
Other | 12th International Conference on Advanced Data Mining and Applications, ADMA 2016 |
---|---|
Country/Territory | Australia |
City | Gold Coast |
Period | 12/12/16 → 12/15/16 |
Bibliographical note
Funding Information:This material is based upon work supported in part by the Laboratory for Analytic Sciences (LAS), the Department of Energy National Nuclear Security Administration under Award Number(s) DE-NA0002576 and NSF grant 1029711. Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimers Association; Alzheimers Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd. and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health ( www.fnih.org ). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimers Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. PMD has received research grants and/or advisory fees from several government agencies, advocacy groups and pharmaceutical/imaging companies, and received a grant from ADNI to support data collection for this study. He also owns stock in several companies whose products are not discussed here.
Publisher Copyright:
© Springer International Publishing AG 2016.