TY - JOUR
T1 - Perplexity and proximity
T2 - Large language model perplexity complements semantic distance metrics for the detection of incoherent speech
AU - Xu, Weizhe
AU - Pakhomov, Serguei
AU - Heagerty, Patrick
AU - Horvitz, Eric
AU - Bradley, Ellen R.
AU - Woolley, Josh
AU - Campbell, Andrew
AU - Cohen, Alex
AU - Ben-Zeev, Dror
AU - Cohen, Trevor
N1 - Publisher Copyright:
© 2025 The Authors
PY - 2025/10
Y1 - 2025/10
N2 - Objective: Semantic coherence in speech is characterized by a logical, connected flow of ideas. A lack of coherence in speech may reflect disorganized thinking, a core feature of psychosis in schizophrenia spectrum disorders (SSDs). Developing tools that could help with automated assessment of semantic coherence in language could facilitate early detection of SSDs and improved monitoring of symptoms, enabling more timely intervention. Large language models (LLMs) have demonstrated strong capabilities on numerous language-centric tasks and have shown promise for analyzing semantic coherence due to the natural fit between their innate measures of language perplexity and the surprising turns that incoherent narrative often takes. This study aims to develop a novel representation and associated measure of semantic coherence using LLM-based perplexity metrics and to compare this measure with traditional vector distance-based coherence metrics. Method: We evaluated “bag” and “chain” models based on LLM perplexities as measures of semantic coherence. Regression models were trained using both single and paired combinations of perplexity- and proximity-based features to predict human ratings of semantic coherence using standardized instruments. Performance was evaluated on held-out examples from a training set of speeches from individuals experiencing psychotic symptoms and a test set of clinical interviews with patients diagnosed with SSDs, both with labels from human assessments of disorganized thinking severity. Results: The best performance was achieved using a combination of perplexity and proximity features, yielding a Spearman correlation with human ratings of 0.61 (vs. 0.56 with proximity features alone) on leave-one-out cross-validation in the training set, and 0.54 (vs. 0.52 with proximity features alone) on the test set. Conclusion: We developed novel methods for assessing semantic coherence using LLM perplexities and found them complementary to proximity-based methods. Combined, these methods showed improved performance across two datasets, highlighting LLM's potential in enhancing automated diagnosis and monitoring of SSDs.
AB - Objective: Semantic coherence in speech is characterized by a logical, connected flow of ideas. A lack of coherence in speech may reflect disorganized thinking, a core feature of psychosis in schizophrenia spectrum disorders (SSDs). Developing tools that could help with automated assessment of semantic coherence in language could facilitate early detection of SSDs and improved monitoring of symptoms, enabling more timely intervention. Large language models (LLMs) have demonstrated strong capabilities on numerous language-centric tasks and have shown promise for analyzing semantic coherence due to the natural fit between their innate measures of language perplexity and the surprising turns that incoherent narrative often takes. This study aims to develop a novel representation and associated measure of semantic coherence using LLM-based perplexity metrics and to compare this measure with traditional vector distance-based coherence metrics. Method: We evaluated “bag” and “chain” models based on LLM perplexities as measures of semantic coherence. Regression models were trained using both single and paired combinations of perplexity- and proximity-based features to predict human ratings of semantic coherence using standardized instruments. Performance was evaluated on held-out examples from a training set of speeches from individuals experiencing psychotic symptoms and a test set of clinical interviews with patients diagnosed with SSDs, both with labels from human assessments of disorganized thinking severity. Results: The best performance was achieved using a combination of perplexity and proximity features, yielding a Spearman correlation with human ratings of 0.61 (vs. 0.56 with proximity features alone) on leave-one-out cross-validation in the training set, and 0.54 (vs. 0.52 with proximity features alone) on the test set. Conclusion: We developed novel methods for assessing semantic coherence using LLM perplexities and found them complementary to proximity-based methods. Combined, these methods showed improved performance across two datasets, highlighting LLM's potential in enhancing automated diagnosis and monitoring of SSDs.
UR - https://www.scopus.com/pages/publications/105014169182
UR - https://www.scopus.com/pages/publications/105014169182#tab=citedBy
U2 - 10.1016/j.jbi.2025.104899
DO - 10.1016/j.jbi.2025.104899
M3 - Article
C2 - 40849054
AN - SCOPUS:105014169182
SN - 1532-0464
VL - 170
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
M1 - 104899
ER -