We outline numerical characterization of DNA primary sequence based on calculation of the average distance between pairs of nucleic acid bases. This leads to a representation of DNA by a condensed 4 × 4 symmetrical matrix, the elements of which give the average separation between pair of bases X, Y in DNA (X, Y = A, C, G, T). As an invariant of choice we consider the leading eigenvalue of the derived 4 × 4 matrix. Additional structurally related invariants were obtained by constructing additional "higher order" 4 × 4 matrices derived from the initial 4 × 4 matrix by raising its elements to higher powers. Suitably normalized leading eigenvalue of these matrices offer a novel characterization of DNA primary sequences, referred to as "DNA profiles". The approach is illustrated on exon 1 of human β-globin gene.
|Original language||English (US)|
|Number of pages||8|
|Journal||Journal of chemical information and computer sciences|
|State||Published - Dec 1 2001|