Clustering of zika viruses originating from different geographical regions using computational sequence descriptors

Marjan Vračko, Subhash C. Basak, Dwaipayan Sen, Ashesh Nandy

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


Background: In this report, we consider a data set, which consists of 310 Zika virus genome sequences taken from different continents, Africa, Asia and South America. The sequences, which were compiled from GenBank, were derived from the host cells of different mammalian species (Simiiformes, Aedes opok, Aedes africanus, Aedes luteocephalus, Aedes dalzieli, Aedes aegypti, and Homo sapiens). Methods: For chemometrical treatment, the sequences have been represented by sequence descriptors derived from their graphs or neighborhood matrices. The set was analyzed with three chemometrical methods: Mahalanobis distances, principal component analysis (PCA) and self organizing maps (SOM). A good separation of samples with respect to the region of origin was observed using these three methods. Results: Study of 310 Zika virus genome sequences from different continents. To characterize and compare Zika virus sequences from around the world using alignment-free sequence comparison and chemometrical methods. Conclusion: Mahalanobis distance analysis, self organizing maps, principal components were used to carry out the chemometrical analyses of the Zika sequence data. Genome sequences are clustered with respect to the region of origin (continent, country). Africa samples are well separated from Asian and South American ones.

Original languageEnglish (US)
Pages (from-to)314-322
Number of pages9
JournalCurrent computer-aided drug design
Issue number2
StatePublished - 2021

Bibliographical note

Funding Information:
This study has been supported by (ARRS) under contract P1-0017.

Funding Information:
MV thanks Slovenian Research Agency (ARRS) for the support of our research under contract P1-0017.

Publisher Copyright:
© 2021 Bentham Science Publishers.


  • Alignment-free descriptor
  • Clustering
  • Geographical distribution
  • Mahalanobis distance
  • Principal component analysis
  • Self-organizing map
  • Zika virus

PubMed: MeSH publication types

  • Journal Article


Dive into the research topics of 'Clustering of zika viruses originating from different geographical regions using computational sequence descriptors'. Together they form a unique fingerprint.

Cite this