Exploring learning approaches for ancient Greek character recognition with citizen science data

Matthew I. Swindall, Gregory Croisdale, Chase C. Hunter, Ben Keener, Alex C. Williams, James H. Brusuelas, Nita Krevans, Melissa Sellew, Lucy Fortson, John F. Wallin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The central dogma of handwritten character recognition remains inextricably linked to optical character recognition methods for print media. Alongside their reliance on proprietary data and lack of open-access software, the applicability of these optical character recognition methods to handwritten characters from low-quality documents (e.g., that are damaged) remains unknown. In this paper, we compare and contrast the performance of state-of-the-art optical character recognition tools for print and learning models engineered with state-of-the-art machine learning toolkits trained on handwritten inputs. Using Tesseract OCR as a baseline, we build, optimize, and evaluate three types of convolutional neural networks that are trained on the AL-ALLand AL-PUBdatasets, a collection of images of handwritten ancient Greek characters that were labeled by volunteers through the Ancient Lives online citizen science project. We find our best-performing machine learning model to be 92.57% accurate compared to Tesseract OCR's 11.15%. Following our analysis, we present a brief examination of our models' shortcomings, introduce the publicly-available AL-PUBdataset, and, describe Theia, a web-based tool that democratizes our machine learning models for public use. We conclude by discussing the promise of our findings for advancing research at the intersection of machine learning, manuscript transcription, and the digital humanities.

Original languageEnglish (US)
Title of host publicationProceedings - IEEE 17th International Conference on eScience, eScience 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages128-137
Number of pages10
ISBN (Electronic)9781665403610
DOIs
StatePublished - Sep 2021
Event17th IEEE International Conference on eScience, eScience 2021 - Virtual, Online, Austria
Duration: Sep 20 2021Sep 23 2021

Publication series

Name2021 IEEE 17th International Conference on eScience (eScience)

Conference

Conference17th IEEE International Conference on eScience, eScience 2021
Country/TerritoryAustria
CityVirtual, Online
Period9/20/219/23/21

Bibliographical note

Funding Information:
ACKNOWLEDGMENT This research is made possible by the thousands of Zooni-verse volunteers who participated in the Ancient Lives project over the past decade. We recognize these volunteers and thank them for their efforts in spurring advances not only across the humanities, but also, now, the sciences. We also thank the Imaging Papyri Project at the University of Oxford for providing access to the digitized manuscript images as well as the Egyptian Exploration Society for providing access to the Oxyrhynchus Papyri. This research was partially funded by the Andrew W. Mellon Foundation and The Chellgren Center for Undergraduate Excellence.

Publisher Copyright:
© 2021 IEEE.

Keywords

  • Ancient Greek
  • Character transcription
  • Citizen science
  • Crowdsourcing
  • Dataset
  • Machine learning
  • Papyrology

Fingerprint

Dive into the research topics of 'Exploring learning approaches for ancient Greek character recognition with citizen science data'. Together they form a unique fingerprint.

Cite this