Forced-alignment and edit-distance scoring for vocabulary tutoring applications

Serguei Pakhomov, Jayson Richardson, Matt Finholt-Daniel, Gregory Sales

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    1 Scopus citations

    Abstract

    We demonstrate an application of Automatic Speech Recognition (ASR) technology to the assessment of young children's basic English vocabulary. We use a test set of 2935 speech samples manually rated by 3 reviewers to compare several approaches to measuring and classifying the accuracy of the children's pronunciation of words, including acoustic confidence scoring obtained by forced alignment and edit distance between the expected and actual ASR output. We show that phoneme-level language modeling can be used to obtain good classification results even with a relatively small amount of acoustic training data. The area under the ROC curve of the ASR-based classifier that uses a bi-phone language model interpolated with a general English bi-phone model is 0.80 (95% CI 0.78-0.82). The point where both sensitivity and specificity are at their maximum is where sensitivity is 0.74 and the specificity is 0.80 with 0.77 harmonic mean, which is comparable to human performance (ICC=0.75; absolute agreement = 81%).

    Original languageEnglish (US)
    Title of host publicationText, Speech and Dialogue - 11th International Conference, TSD 2008, Proceedings
    Pages443-450
    Number of pages8
    DOIs
    StatePublished - 2008
    Event11th International Conference on Text, Speech and Dialogue, TSD 2008 - Brno, Czech Republic
    Duration: Sep 8 2008Sep 12 2008

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume5246 LNAI
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Other

    Other11th International Conference on Text, Speech and Dialogue, TSD 2008
    CountryCzech Republic
    CityBrno
    Period9/8/089/12/08

    Keywords

    • Automatic speech recognition
    • Sub-word language modeling
    • Vocabulary tutor

    Fingerprint Dive into the research topics of 'Forced-alignment and edit-distance scoring for vocabulary tutoring applications'. Together they form a unique fingerprint.

  • Cite this

    Pakhomov, S., Richardson, J., Finholt-Daniel, M., & Sales, G. (2008). Forced-alignment and edit-distance scoring for vocabulary tutoring applications. In Text, Speech and Dialogue - 11th International Conference, TSD 2008, Proceedings (pp. 443-450). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5246 LNAI). https://doi.org/10.1007/978-3-540-87391-4_57