Discriminative pattern mining looks for association patterns that occur more frequently in one class than another and has important applications in many areas including finding biomarkers in biomedical data. However, finding such patterns is challenging because higher order combinations of variables may show high discrimination even when single variables or lower-order combinations show little or no discrimination. Thus, generating such patterns is important for evaluating discriminative pattern mining algorithms and better understanding the nature of discriminative patterns. To that end, we describe how such patterns can be defined using mathematical constraints which are then solved with widely available software that generates solutions for the resulting optimization problem. We present a basic formulation of the problem obtained from a straightforward translation of the desired pattern characteristics into mathematical constraints, and then show how the pattern generation problem can be reformulated in terms of the selection of rows from a truth table. This formulation is more efficient and provides deeper insight into the process of creating higher order patterns. It also makes it easy to define patterns other than just those based on the conjunctive logic used by traditional association and discriminant pattern analysis.
|Title of host publication
|Advances in Knowledge Discovery and Data Mining - 15th Pacific-Asia Conference, PAKDD 2011, Proceedings
|Number of pages
|Published - 2011
|Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Bibliographical noteFunding Information:
This work was supported by NSF grant IIS-0916439. Computing resources were provided by the Minnesota Supercomputing Institute.
- association analysis
- discriminative pattern mining
- frequent patterns
- linear programming
- synthetic data generation