TY - JOUR
T1 - A fast, resource efficient, and reliable rule-based system for COVID-19 symptom identification
AU - Sahoo, Himanshu S
AU - Silverman, Greg M
AU - Ingraham, Nicholas E
AU - Lupei, Monica I
AU - Puskarich, Michael A
AU - Finzel, Raymond L
AU - Sartori, John
AU - Zhang, Rui
AU - Knoll, Benjamin C
AU - Liu, Sijia
AU - Liu, Hongfang
AU - Melton, Genevieve B
AU - Tignanelli, Christopher J
AU - Pakhomov, Serguei V S
N1 - Publisher Copyright:
© 2021 The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association.
PY - 2021/7/1
Y1 - 2021/7/1
N2 - Objective: With COVID-19, there was a need for a rapidly scalable annotation system that facilitated real-time integration with clinical decision support systems (CDS). Current annotation systems suffer from a high-resource utilization and poor scalability limiting real-world integration with CDS. A potential solution to mitigate these issues is to use the rule-based gazetteer developed at our institution.Materials and Methods: Performance, resource utilization, and runtime of the rule-based gazetteer were compared with five annotation systems: BioMedICUS, cTAKES, MetaMap, CLAMP, and MedTagger.Results: This rule-based gazetteer was the fastest, had a low resource footprint, and similar performance for weighted microaverage and macroaverage measures of precision, recall, and f1-score compared to other annotation systems.Discussion: Opportunities to increase its performance include fine-tuning lexical rules for symptom identification. Additionally, it could run on multiple compute nodes for faster runtime.Conclusion: This rule-based gazetteer overcame key technical limitations facilitating real-time symptomatology identification for COVID-19 and integration of unstructured data elements into our CDS. It is ideal for large-scale deployment across a wide variety of healthcare settings for surveillance of acute COVID-19 symptoms for integration into prognostic modeling. Such a system is currently being leveraged for monitoring of postacute sequelae of COVID-19 (PASC) progression in COVID-19 survivors. This study conducted the first in-depth analysis and developed a rule-based gazetteer for COVID-19 symptom extraction with the following key features: low processor and memory utilization, faster runtime, and similar weighted microaverage and macroaverage measures for precision, recall, and f1-score compared to industry-standard annotation systems.
AB - Objective: With COVID-19, there was a need for a rapidly scalable annotation system that facilitated real-time integration with clinical decision support systems (CDS). Current annotation systems suffer from a high-resource utilization and poor scalability limiting real-world integration with CDS. A potential solution to mitigate these issues is to use the rule-based gazetteer developed at our institution.Materials and Methods: Performance, resource utilization, and runtime of the rule-based gazetteer were compared with five annotation systems: BioMedICUS, cTAKES, MetaMap, CLAMP, and MedTagger.Results: This rule-based gazetteer was the fastest, had a low resource footprint, and similar performance for weighted microaverage and macroaverage measures of precision, recall, and f1-score compared to other annotation systems.Discussion: Opportunities to increase its performance include fine-tuning lexical rules for symptom identification. Additionally, it could run on multiple compute nodes for faster runtime.Conclusion: This rule-based gazetteer overcame key technical limitations facilitating real-time symptomatology identification for COVID-19 and integration of unstructured data elements into our CDS. It is ideal for large-scale deployment across a wide variety of healthcare settings for surveillance of acute COVID-19 symptoms for integration into prognostic modeling. Such a system is currently being leveraged for monitoring of postacute sequelae of COVID-19 (PASC) progression in COVID-19 survivors. This study conducted the first in-depth analysis and developed a rule-based gazetteer for COVID-19 symptom extraction with the following key features: low processor and memory utilization, faster runtime, and similar weighted microaverage and macroaverage measures for precision, recall, and f1-score compared to industry-standard annotation systems.
KW - and symptoms
KW - artificial intelligence
KW - clinical decision support systems
KW - follow-up studies
KW - information extraction
KW - Natural language processing
KW - signs
UR - http://www.scopus.com/inward/record.url?scp=85118120384&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85118120384&partnerID=8YFLogxK
U2 - 10.1093/jamiaopen/ooab070
DO - 10.1093/jamiaopen/ooab070
M3 - Article
C2 - 34423261
SN - 2574-2531
VL - 4
SP - ooab070
JO - JAMIA Open
JF - JAMIA Open
IS - 3
M1 - ooab070
ER -