Time and sample efficient discovery of Markov blankets and direct causal relations

Ioannis Tsamardinos, Constantin F. Aliferis, Alexander Statnikov

Research output: Chapter in Book/Report/Conference proceedingConference contribution

166 Scopus citations

Abstract

Data Mining with Bayesian Network learning has two important characteristics: under conditions learned edges between variables correspond to casual influences, and second, for every variable T in the network a special subset (Markov Blanket) identifiable by the network is the minimal variable set required to predict T. However, all known algorithms learning a complete BN do not scale up beyond a few hundred variables. On the other hand, all known sound algorithms learning a local region of the network require an exponential number of training instances to the size of the learned region.The contribution of this paper is two-fold. We introduce a novel local algorithm that returns all variables with direct edges to and from a target variable T as well as a local algorithm that returns the Markov Blanket of T. Both algorithms (i) are sound, (ii) can be run efficiently in datasets with thousands of variables, and (iii) significantly outperform in terms of approximating the true neighborhood previous state-of-the-art algorithms using only a fraction of the training size required by the existing methods. A fundamental difference between our approach and existing ones is that the required sample depends on the generating graph connectivity and not the size of the local region; this yields up to exponential savings in sample relative to previously known algorithms. The results presented here are promising not only for discovery of local causal structure, and variable selection for classification, but also for the induction of complete BNs.

Original languageEnglish (US)
Title of host publicationProceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03
Pages673-678
Number of pages6
DOIs
StatePublished - Dec 1 2003
Event9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03 - Washington, DC, United States
Duration: Aug 24 2003Aug 27 2003

Other

Other9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03
CountryUnited States
CityWashington, DC
Period8/24/038/27/03

Keywords

  • Bayesian Networks
  • Novel data mining algorithms
  • Robust and scalable statistical methods

Fingerprint Dive into the research topics of 'Time and sample efficient discovery of Markov blankets and direct causal relations'. Together they form a unique fingerprint.

Cite this