TY - JOUR
T1 - Backdoor Adjustment of Confounding by Provenance for Robust Text Classification of Multi-institutional Clinical Notes
AU - Ding, Xiruo
AU - Sheng, Zhecheng
AU - Yetişgen, Meliha
AU - Pakhomov, Serguei
AU - Cohen, Trevor
N1 - Publisher Copyright:
©2023 AMIA - All rights reserved.
PY - 2023
Y1 - 2023
N2 - Natural Language Processing (NLP) methods have been broadly applied to clinical tasks. Machine learning and deep learning approaches have been used to improve the performance of clinical NLP. However, these approaches require sufficiently large datasets for training, and trained models have been shown to transfer poorly across sites. These issues have led to the promotion of data collection and integration across different institutions for accurate and portable models. However, this can introduce a form of bias called confounding by provenance. When source-specific data distributions differ at deployment, this may harm model performance. To address this issue, we evaluate the utility of backdoor adjustment for text classification in a multi-site dataset of clinical notes annotated for mentions of substance abuse. Using an evaluation framework devised to measure robustness to distributional shifts, we assess the utility of backdoor adjustment. Our results indicate that backdoor adjustment can effectively mitigate for confounding shift.
AB - Natural Language Processing (NLP) methods have been broadly applied to clinical tasks. Machine learning and deep learning approaches have been used to improve the performance of clinical NLP. However, these approaches require sufficiently large datasets for training, and trained models have been shown to transfer poorly across sites. These issues have led to the promotion of data collection and integration across different institutions for accurate and portable models. However, this can introduce a form of bias called confounding by provenance. When source-specific data distributions differ at deployment, this may harm model performance. To address this issue, we evaluate the utility of backdoor adjustment for text classification in a multi-site dataset of clinical notes annotated for mentions of substance abuse. Using an evaluation framework devised to measure robustness to distributional shifts, we assess the utility of backdoor adjustment. Our results indicate that backdoor adjustment can effectively mitigate for confounding shift.
UR - http://www.scopus.com/inward/record.url?scp=85182544872&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85182544872&partnerID=8YFLogxK
M3 - Article
C2 - 38222433
AN - SCOPUS:85182544872
SN - 1559-4076
VL - 2023
SP - 923
EP - 932
JO - AMIA ... Annual Symposium proceedings. AMIA Symposium
JF - AMIA ... Annual Symposium proceedings. AMIA Symposium
ER -