Diabetes mellitus is a rapidly increasing and costly public health problem. Large studies are needed to understand the complex gene-environment interactions that lead to diabetes and its complications. The Marshfield Clinic Personalized Medicine Research Project (PMRP) represents one of the largest population-based DNA biobanks in the United States. As part of an effort to begin phenotyping common diseases within the PMRP, we now report on the construction of a diabetes case-finding algorithm using electronic medical record data from adult subjects aged a50 years living in one of the target PMRP ZIP codes. Based upon diabetic diagnostic codes alone, we observed a false positive case rate ranging from 3.0% (in subjects with the highest glycosylated hemoglobin values) to 44.4% (in subjects with the lowest glycosylated hemoglobin values). We therefore developed an improved case finding algorithm that utilizes diabetic diagnostic codes in combination with clinical laboratory data and medication history. This algorithm yielded an estimated prevalence of 24.2% for diabetes mellitus in adult subjects aged ≥50 years.
- Natural language processing