TY - GEN
T1 - Frequent sub-structure-based approaches for classifying chemical compounds
AU - Deshpande, Mukund
AU - Kuramochi, Michihiro
AU - Karypis, George
PY - 2003
Y1 - 2003
N2 - In this paper we study the problem of classifying chemical compound datasets. We present a sub-structure-based classification algorithm that decouples the sub-structure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric sub-structures present in the dataset. The advantage of our approach is that during classification model construction, all relevant sub-structures are available allowing the classifier to intelligently select the most discriminating ones. The computational scalability is ensured by the use of highly efficient frequent subgraph discovery algorithms coupled with aggressive feature selection. Our experimental evaluation on eight different classification problems shows that our approach is computationally scalable and on the average, outperforms existing schemes by 10% to 35%.
AB - In this paper we study the problem of classifying chemical compound datasets. We present a sub-structure-based classification algorithm that decouples the sub-structure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric sub-structures present in the dataset. The advantage of our approach is that during classification model construction, all relevant sub-structures are available allowing the classifier to intelligently select the most discriminating ones. The computational scalability is ensured by the use of highly efficient frequent subgraph discovery algorithms coupled with aggressive feature selection. Our experimental evaluation on eight different classification problems shows that our approach is computationally scalable and on the average, outperforms existing schemes by 10% to 35%.
UR - http://www.scopus.com/inward/record.url?scp=34547984408&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34547984408&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:34547984408
SN - 0769519784
SN - 9780769519784
T3 - Proceedings - IEEE International Conference on Data Mining, ICDM
SP - 35
EP - 42
BT - Proceedings - 3rd IEEE International Conference on Data Mining, ICDM 2003
T2 - 3rd IEEE International Conference on Data Mining, ICDM '03
Y2 - 19 November 2003 through 22 November 2003
ER -