Evaluation of techniques for classifying biological sequences

Mukund Deshpande, George Karypis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

39 Scopus citations

Abstract

In recent years we have witnessed an exponential increase in the amount of biological information, either DNA or protein sequences, that has become available in public databases. This has been followed by an increased interest in developing computational techniques to automatically classify these large volumes of sequence data into various categories corresponding to either their role in the chromosomes, their structure, and/or their function. In this paper we evaluate some of the widely-used sequence classification algorithms and develop a framework for modeling sequences in a fashion so that traditional machine learning algorithms, such as support vector machines, can be applied easily. Our detailed experimental evaluation shows that the SVM-based approaches are able to achieve higher classification accuracy compared to the more traditional sequence classification algorithms such as Markov model based techniques and K-nearest neighbor based approaches.

Original languageEnglish (US)
Title of host publicationAdvances in Knowledge Discovery and Data Mining - 6th Pacific-Asia Conference, PAKDD 2002, Proceedings
EditorsMing-Syan Chen, Philip S. Yu, Bing Liu
PublisherSpringer Verlag
Pages417-431
Number of pages15
ISBN (Print)9783540437048
DOIs
StatePublished - 2002
Event6th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2002 - Taipei, Taiwan, Province of China
Duration: May 6 2002May 8 2002

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2336
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other6th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2002
CountryTaiwan, Province of China
CityTaipei
Period5/6/025/8/02

Fingerprint Dive into the research topics of 'Evaluation of techniques for classifying biological sequences'. Together they form a unique fingerprint.

  • Cite this

    Deshpande, M., & Karypis, G. (2002). Evaluation of techniques for classifying biological sequences. In M-S. Chen, P. S. Yu, & B. Liu (Eds.), Advances in Knowledge Discovery and Data Mining - 6th Pacific-Asia Conference, PAKDD 2002, Proceedings (pp. 417-431). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2336). Springer Verlag. https://doi.org/10.1007/3-540-47887-6_41