NEW YORK — Electronic health record data may be able to identify individuals who may benefit from genetic testing for diseases, a new analysis has found.
Rare genetic diseases affect about 5 percent of the world's population and can be tricky to diagnose, especially as many conditions are unknown and others can present with a range of symptoms or phenotypes. Patients often undergo long diagnostic odysseys.
Researchers from Vanderbilt University Medical Center suspected that longitudinal clinical data housed within EHRs could be used to identify patterns of rare phenotypes found among rare diseases and uncover individuals who might be affected. As they reported in Nature Medicine this week, the researchers developed a machine-learning based prediction model based on that idea and found it had high accuracy in identifying patients who underwent chromosomal microarray analysis, suggesting to the researchers that it could, in turn, identify individuals who may benefit from genetic testing more systematically.
"Patients with rare genetic diseases often face years of diagnostic odyssey before getting a genetic test, if they get one at all," lead author Douglas Ruderfer, an associate professor of medicine at Vanderbilt, said in a statement. "Our work could contribute to a more systematic and timely approach, alerting providers of patients that might benefit from a genetic test."
The researchers trained several models using data from a cohort of 2,286 patients who underwent chromosomal microarray testing and 9,144 matched controls who did not. They examined whether they could predict who had undergone genetic testing based on differences in diagnostic billing codes representing different phenotypes, or phecodes, in their EHRs. The best-performing model encompasses a random forest analysis and phecode counts as input.
That model further performed well when the researchers removed any phecodes that might have been from after the patients underwent genetic testing: It was able to correctly classify 87 percent of cases and 96 percent of controls.
While chromosomal microarray testing is often a first-line genetic test, the researchers noted that it is not the only type of genetic testing offered. They tested their model on a broader sample of 172,265 people, 10,074 of whom had visited a genetics clinic and 107,263 controls with no suspicion of genetic disease in their medical record. In this cohort, the model could also correctly classify patients with accuracy. They validated their model at Massachusetts General Brigham to find it had high accuracy at an external site.
The researchers also tested their model on a set of 16 genetic diseases not in their training set, including Down syndrome and cystic fibrosis. They found it could identify patients with these more common genetic diseases, suggesting that the pattern of many rare phenotypes may hold across genetic diseases.
"After extensive validation demonstrated high predictive performance, we were really interested in assessing how an implementation of our model might compare to the current status quo for who receives a test, and what the results of those tests are," Ruderfer said.
He and his colleagues examined whether their model might be able to pick out patients for genetic testing before clinicians do. They estimated that their model would have suggested patients undergo testing months before they actually did. Depending on the threshold set, the model would have suggested testing between 122 days and 315 days earlier, on average.
The findings suggested to the researchers that their model could automate and systematize which patients are suspected of having a genetic disease and undergo genetic testing. They noted that their goal is to improve the identification of patients with genetic disease, not necessarily through wider testing but by making testing access more consistent and equitable.