Abstract
Identifying self-disclosed health diagnoses in social media data using regular expressions (e.g. "I’ve been diagnosed with <Disease X>") is a well-established approach for creating ad hoc cohorts of individuals with specific health conditions. However there is evidence to suggest that this method of identifying individuals is unreliable when creating cohorts for some mental health and neurodegenerative conditions. In the case of dementia, the focus of this paper, diagnostic disclosures are frequently whimsical or sardonic, rather than indicative of an authentic diagnosis or underlying disease state (e.g. "I forgot my keys again. I’ve got dementia!"). With this work and utilising an annotated corpus of 14,025 dementia diagnostic self-disclosure posts derived from Twitter, we leveraged LLMs to distinguish between "authentic" dementia self-disclosures and "inauthentic" self-disclosures. Specifically, we implemented a genetic algorithm that evolves prompts using various state-of-the-art prompt engineering techniques, including chain of thought, self-critique, generated knowledge, and expert prompting. Our results showed that, of the methods tested, the evolved self-critique prompt engineering method achieved the best result, with an F1-score of 0.8.
Original language | English (US) |
---|---|
Pages | 189-196 |
Number of pages | 8 |
State | Published - 2024 |
Event | 22nd Annual Workshop of the Australasian Language Technology Association, ALTA 2024 - Canberra, Australia Duration: Dec 2 2024 → Dec 4 2024 |
Conference
Conference | 22nd Annual Workshop of the Australasian Language Technology Association, ALTA 2024 |
---|---|
Country/Territory | Australia |
City | Canberra |
Period | 12/2/24 → 12/4/24 |
Bibliographical note
Publisher Copyright:© 2024 Association for Computational Linguistics.