Reverse-engineering of causal pathways that implicate diseases and vital cellular functions is a fundamental problem in biomedicine. Discovery of the local causal pathway of a target variable (that consists of its direct causes and direct effects) is essential for effective intervention and can facilitate accurate diagnosis and prognosis. Recent research has provided several active learning methods that can leverage passively observed high-throughput data to draft causal pathways and then refine the inferred relations with a limited number of experiments. The current study provides a comprehensive evaluation of the performance of active learning methods for local causal pathway discovery in real biological data. Specifically, 54 active learning methods/variants from 3 families of algorithms were applied for local causal pathways reconstruction of gene regulation for 5 transcription factors in S. cerevisiae. Four aspects of the methodsâ €™ performance were assessed, including adjacency discovery quality, edge orientation accuracy, complete pathway discovery quality, and experimental cost. The results of this study show that some methods provide significant performance benefits over others and therefore should be routinely used for local causal pathway discovery tasks. This study also demonstrates the feasibility of local causal pathway reconstruction in real biological systems with significant quality and low experimental cost.
Bibliographical noteFunding Information:
This research was supported in part by the NIH grant R01 LM011179-01A1 from the National Library of Medicine (A.S., C.F.A. and S.M.). The authors acknowledge Frank C.P. Holstege for providing targeted perturbation data that enables construction of the gold-standard networks. The authors are also grateful to Efstratios Efstathiadis and Eric Peskin for the help with providing access and running experiments on the high performance computing facility at New York University Langone Medical Center.