TY - JOUR
T1 - Analyzing influenza virus sequences using binary encoding approach
AU - Lam, Ham Ching
AU - Sreevatsan, Srinand
AU - Boley, Daniel
PY - 2012
Y1 - 2012
N2 - Capturing mutation patterns of each individual influenza virus sequence is often challenging; in this paper, we demonstrated that using a binary encoding scheme coupled with dimension reduction technique, we were able to capture the intrinsic mutation pattern of the virus. Our approach looks at the variance between sequences instead of the commonly used p-distance or Hamming distance. We first convert the influenza genetic sequences to a binary strings and form a binary sequence alignment matrix and then apply Principal Component Analysis (PCA) to this matrix. PCA also provides identification power to identify reassortant virus by using data projection technique. Due to the sparsity of the binary string, we were able to analyze large volume of influenza sequence data in a very short time. For protein sequences, our scheme also allows the incorporation of biophysical properties of each amino acid. Here, we present various encouraging results from analyzing influenza nucleotide, protein and genome sequences using the proposed approach.
AB - Capturing mutation patterns of each individual influenza virus sequence is often challenging; in this paper, we demonstrated that using a binary encoding scheme coupled with dimension reduction technique, we were able to capture the intrinsic mutation pattern of the virus. Our approach looks at the variance between sequences instead of the commonly used p-distance or Hamming distance. We first convert the influenza genetic sequences to a binary strings and form a binary sequence alignment matrix and then apply Principal Component Analysis (PCA) to this matrix. PCA also provides identification power to identify reassortant virus by using data projection technique. Due to the sparsity of the binary string, we were able to analyze large volume of influenza sequence data in a very short time. For protein sequences, our scheme also allows the incorporation of biophysical properties of each amino acid. Here, we present various encouraging results from analyzing influenza nucleotide, protein and genome sequences using the proposed approach.
KW - Influenza virus
KW - binary encoding
KW - evolution
KW - principal component analysis
UR - http://www.scopus.com/inward/record.url?scp=84859193390&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84859193390&partnerID=8YFLogxK
U2 - 10.3233/SPR-2012-334
DO - 10.3233/SPR-2012-334
M3 - Article
AN - SCOPUS:84859193390
SN - 1058-9244
VL - 20
SP - 3
EP - 13
JO - Scientific Programming
JF - Scientific Programming
IS - 1
ER -