Abstract
Purpose: The purpose of this study was to assess the relationship and comparability of cepstral and spectral measures of voice obtained from a high-cost “flat” microphone and precision sound level meter (SLM) vs. high-end and entry level models of commonly and currently used smartphones (iPhone i12 and iSE; Samsung s21 and s9 smartphones). Device comparisons were also conducted in different settings (sound-treated booth vs. typical “quiet” office room) and at different mouth-to-microphone distances (15 and 30 cm). Methods: The SLM and smartphone devices were used to record a series of speech and vowel samples from a prerecorded diverse set of 24 speakers representing a wide range of sex, age, fundamental frequency (F0), and voice quality types. Recordings were analyzed for the following measures: smoothed cepstral peak prominence (CPP in dB); the low vs high spectral ratio (L/H Ratio in dB); and the Cepstral Spectral Index of Dysphonia (CSID). Results: A strong device effect was observed for L/H Ratio (dB) in both vowel and sentence contexts and for CSID in the sentence context. In contrast, device had a weak effect on CPP (dB), regardless of context. Recording distance was observed to have a small-to-moderate effect on measures of CPP and CSID but had a negligible effect on L/H Ratio. With the exception of L/H Ratio in the vowel context, setting was observed to have a strong effect on all three measures. While these aforementioned effects resulted in significant differences between measures obtained with SLM vs. smartphone devices, the intercorrelations of the measurements were extremely strong (r's > 0.90), indicating that all devices were able to capture the range of voice characteristics represented in the voice sample corpus. Regression modeling showed that acoustic measurements obtained from smartphone recordings could be successfully converted to comparable measurements obtained by a "gold standard" (precision SLM recordings conducted in a sound-treated booth at 15 cm) with small degrees of error. Conclusions: These findings indicate that a variety of commonly available modern smartphones can be used to collect high quality voice recordings usable for informative acoustic analysis. While device, setting, and distance can have significant effects on acoustic measurements, these effects are predictable and can be accounted for using regression modeling.
Original language | English (US) |
---|---|
Journal | Journal of Voice |
DOIs | |
State | Accepted/In press - 2023 |
Bibliographical note
Funding Information:Disclosures: Dr. S. N. Awan licenses the algorithms that form the basis of the Analysis of Dysphonia in Speech and Voice (ADSV) program to PENTAX Medical (Montvale, NJ). Dr. Awan is supported by funding from the National Institute on Deafness and Other Communication Disorders ( 2R01DC009029-13A1 ).Dr. S. Misono. is supported by funding from the National Institutes of Health ( UL1TR002494 and K23DC016335 ), American College of Surgeons, and the Triological Society .Dr. J. A. Awan is supported by NSF award number SES-2150615 to Purdue University.
Publisher Copyright:
© 2023 The Voice Foundation
Keywords
- cepstral analysis
- frequency response
- smartphones
- spectral analysis
- voice evaluation
PubMed: MeSH publication types
- Journal Article