Abstract
This paper describes an experiment performed using the Principal Direction Divisive Partitioning algorithm (Boley, 1998) in order to extract linguistic word error regularities from several sets of medical dictation data. For each of six physicians, two hundred finished medical dictations aligned with their corresponding automatic speech recognition output were clustered and the results analyzed for linguistic regularities between and within clusters. Sparsity measures indicated a good fit between the algorithm and the input data. Linguistic analysis of the output clusters showed evidence of systematic word recognition error for short words, function words, words with destressed vowels, and phonological confusion errors due to telephony (recording) bandwidth interference. No qualitatively significant distinctions between clusters could be made by examining word errors alone, but the results confirmed several informally held hypotheses and suggested several avenues of further investigation, such as the examination of word error contexts.
Original language | English (US) |
---|---|
Title of host publication | Machine Learning |
Subtitle of host publication | ECML 2000 - 11th European Conference on Machine Learning, Proceedings |
Editors | Ramon Lopez de Mantaras, Enric Plaza |
Publisher | Springer Verlag |
Pages | 263-270 |
Number of pages | 8 |
ISBN (Print) | 9783540451648 |
DOIs | |
State | Published - 2000 |
Event | 11th European Conference on Machine Learning, ECML 2000 - Barcelona, Catalonia, Spain Duration: May 31 2000 → Jun 2 2000 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 1810 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Other
Other | 11th European Conference on Machine Learning, ECML 2000 |
---|---|
Country/Territory | Spain |
City | Barcelona, Catalonia |
Period | 5/31/00 → 6/2/00 |
Bibliographical note
Publisher Copyright:© Springer-Verlag Berlin Heidelberg 2000.