TY - GEN
T1 - Identification of Ancient Greek Papyrus Fragments Using Genetic Sequence Alignment Algorithms
AU - Williams, Alex C.
AU - Carroll, Hyrum D.
AU - Wallin, John F.
AU - Brusuelas, James
AU - Fortson, Lucy F
AU - Lamblin, Anne Francoise
AU - Yu, Haoyu
N1 - Publisher Copyright:
© 2014 IEEE.
Copyright:
Copyright 2015 Elsevier B.V., All rights reserved.
PY - 2014/12/2
Y1 - 2014/12/2
N2 - Papyrologists analyze, transcribe, and edit papyrus fragments in order to enrich modern lives by better understanding the linguistics, culture, and literature of the ancient world. One of their common tasks is to match an unknown fragment to a known manuscript. This is especially challenging when the fragments are damaged and contain only limited information (e.g., due to deterioration). In the last 100 years, only about 10% of the more than 500,000 fragments recovered from the Egyptian village of Oxyrhynchus have been edited. We do not know what new ancient texts might be found and what can be learned from them, but using current methods of identification this process will take in excess of 1000 years. The identification of an anonymous string of characters with a collection of known text sequences is ubiquitous in computational biology. Genes are often represented by a sequence of continuous characters, each of which denotes an amino acid. Relationships are inferred by finding multi-letter patterns shared between the anonymous sequence and a known sequence. This process is commonly referred to as genetic sequence alignment. In this paper, we introduce a novel methodology that uses modern genetic sequence alignment algorithms as a method for identifying Ancient Greek text fragments. This application will offer papyrologists and other professionals in the humanities the ability to rapidly identify severely damaged texts. This approach leverages a new form of non-contextual, multi-line text identification for the Greek language that can greatly accelerate the tedious task of transcription and identification.
AB - Papyrologists analyze, transcribe, and edit papyrus fragments in order to enrich modern lives by better understanding the linguistics, culture, and literature of the ancient world. One of their common tasks is to match an unknown fragment to a known manuscript. This is especially challenging when the fragments are damaged and contain only limited information (e.g., due to deterioration). In the last 100 years, only about 10% of the more than 500,000 fragments recovered from the Egyptian village of Oxyrhynchus have been edited. We do not know what new ancient texts might be found and what can be learned from them, but using current methods of identification this process will take in excess of 1000 years. The identification of an anonymous string of characters with a collection of known text sequences is ubiquitous in computational biology. Genes are often represented by a sequence of continuous characters, each of which denotes an amino acid. Relationships are inferred by finding multi-letter patterns shared between the anonymous sequence and a known sequence. This process is commonly referred to as genetic sequence alignment. In this paper, we introduce a novel methodology that uses modern genetic sequence alignment algorithms as a method for identifying Ancient Greek text fragments. This application will offer papyrologists and other professionals in the humanities the ability to rapidly identify severely damaged texts. This approach leverages a new form of non-contextual, multi-line text identification for the Greek language that can greatly accelerate the tedious task of transcription and identification.
KW - Ancient Greek
KW - genetic sequence alignment
KW - identification
KW - papyrus
UR - http://www.scopus.com/inward/record.url?scp=84919657825&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84919657825&partnerID=8YFLogxK
U2 - 10.1109/eScience.2014.14
DO - 10.1109/eScience.2014.14
M3 - Conference contribution
AN - SCOPUS:84919657825
T3 - Proceedings - 2014 IEEE 10th International Conference on eScience, eScience 2014
SP - 5
EP - 10
BT - Proceedings - 2014 IEEE 10th International Conference on eScience, eScience 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 10th IEEE International Conference on eScience, eScience 2014
Y2 - 20 October 2014 through 24 October 2014
ER -