Similar_Join: Extending DBMS with a bio-specific operator

Jake Yue Chen, John V. Carlis

Research output: Contribution to conferencePaperpeer-review

5 Scopus citations


Existing sequence comparison software applications lack adequate automation, abstraction, performance, and flexibility. Users need a new way of studying and applying sequence comparisons in the post-genomics era. We invented and developed a new bio-specific Database Management System (DBMS) operator, Similar_Join, to abstract the labor-intensive batch sequence similarity search task into a syntactically concise database operation. We implemented the Similar Join operator as part of a relational operator package. This implementation enabled us to write simple PL/SQL scripts within the DBMS to accomplish routine sequence similarity searches conveniently, for example, a "batch BLAST" that compares 7,000 human genes against 500,000 human Expressed Sequence Tags (EST) in a few hours. We also implemented a simple version of Similar_Join as a database operator in the extended data cartridge of Oracle 8i object-relational DBMS. When fully integrated into SQL language extensions, we demonstrated this operator could enable biology users to achieve interesting complex biological queries previously impossible inside the DBMS.

Original languageEnglish (US)
Number of pages6
StatePublished - 2003
Externally publishedYes
EventProceedings of the 2003 ACM Symposium on Applied Computing - Melbourne, FL, United States
Duration: Mar 9 2003Mar 12 2003


OtherProceedings of the 2003 ACM Symposium on Applied Computing
Country/TerritoryUnited States
CityMelbourne, FL


  • Database Management System (DBMS)
  • Genomic DBMS Extension
  • Relational Operator
  • Similar_Join Operator
  • Similarity Search


Dive into the research topics of 'Similar_Join: Extending DBMS with a bio-specific operator'. Together they form a unique fingerprint.

Cite this