Catching old influenza virus with a new Markov model

Research output: Contribution to conferencePaperpeer-review

Abstract

We have developed a novel Markov model which models the genetic distance between viruses based on the Hemagglutinin (HA) gene, a major surface antigen of the avian influenza virus. Using this model we estimate the probability of finding highly similar virus sequences separated by long time gaps. Our biological assumption is based on neutral evolutionary theory, which has been applied previously to study this virus [Gojobori, Moriyama, and Kimura. PNAS Vol 87. 1990]. Our working hypothesis is that after a long enough time gap and with the high mutation rate usually found in RNA viruses, many site mutations should accumulate, leading to distinct modern variants. We obtained 3439 HA protein sequences isolated through years 1918 to 2006 from around the globe, aligned them to a consensus sequence using the NCBI alignment tool, and used a Hamming distance metric on the aligned sequences. We tested our hypothesis by combining a standard Poisson process with a Markov model. The Poisson process models the occurrences of mutations in a given time interval, and the Markov model estimates the probabilities of changes to the genetic distances due to mutations. By coalescing all sequences at a given genetic distance to a single state, we obtain a tractable Markov chain with a number of states equal to the length of the base peptide sequence. The model predicts that the probability of finding highly similar virus after several decades is extremely small. The existence of recent viruses which are very similar to older viruses suggests that potentially there exists some reservoir which preserves viruses over long periods.

Original languageEnglish (US)
Pages38-43
Number of pages6
StatePublished - 2008
Externally publishedYes
Event8th International Workshop on Data Mining in Bioinformatics, BIOKDD 2008 - Held in conjunction with 14th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2008 - Las Vegas, United States
Duration: Aug 24 2008Aug 24 2008

Conference

Conference8th International Workshop on Data Mining in Bioinformatics, BIOKDD 2008 - Held in conjunction with 14th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2008
Country/TerritoryUnited States
CityLas Vegas
Period8/24/088/24/08

Bibliographical note

Funding Information:
This work was partially supported by NSF grant 0534286.

Publisher Copyright:
© 2008 ACM

Keywords

  • Influenza virus
  • Markov Model
  • Poisson process

Fingerprint

Dive into the research topics of 'Catching old influenza virus with a new Markov model'. Together they form a unique fingerprint.

Cite this