Skip to main content
Premium Trial:

Request an Annual Quote

CHOP Automates Phenotyping of Epilepsy Patients, Heralding 'EMR Genomics' Era


CHICAGO – Neurogeneticists and bioinformaticians at Children's Hospital of Philadelphia (CHOP) have developed a technique to automate phenotyping of pediatric epilepsy patients to create gene-specific "footprints" in electronic medical records. They hope this computational technique will help researchers expand outcomes studies and eventually lead to better clinical decision support at the point of care.

They described their work in a paper published in Genetics in Medicine last week.

This study involved a cohort of 658 patients with confirmed or presumed genetic epilepsy. It was easy for the researchers to put this cohort together because their genetic tests clearly showed a mutation. The picture got muddled when they tried matching this genetic information with each patient's clinical records.

The authors noted that more than 30 percent of developmental and epileptic encephalopathies are associated with genetic mutations, but it can be difficult to identify causative genes because specific variants usually are present in less than 1 percent of all patients. Having a complete phenotype improves accuracy and allows physicians to provide families with better estimates of how and when their children might develop epileptic symptoms.

Organizations such as CHOP can analyze thousands of exomes at once, limited only by computational capacity, but including clinical data in research studies usually involves slow, labor-intensive manual chart review. Lead investigator Ingo Helbig, director of the genomic and data science core of the CHOP Epilepsy Neurogenetics Initiative (ENGIN), described this as a "phenotype gap" that has developed over the last 10 years or so.

"This means that our understanding of clinical data and our ability to process clinical data is far behind what we can do with genetics," he said.

"Our goal with this technology is to find complementary approaches to look at natural history, medication response, and outcomes," something that is missing with many rare diseases, Helbig said. "We just simply do not have the capacity to manually review a thousand patient charts to rebuild a patient's history or every rare disease."

He noted that analysis of phenotypic data is not scalable like genome analysis is. "What we have seen over the last decade is that our understanding of phenotypes [has fallen] behind more and more, especially when we look at information about outcomes and we look at about longitudinal disease history. We are way behind where we are with gene discovery," Helbig said.

Helbig's lab has been working for the past four years to address this gap.

This new article builds on an earlier study of phenotypes, in which a team led by Helbig built computer algorithms that uncovered a de novo gene variant that appears to cause a developmental and epileptic encephalopathy. That study, published in 2019 in the American Journal of Human Genetics, described a series of functional analyses that showed how an alteration in the AP2M1 gene affects clathrin-mediated endocytosis and synaptic vesicle recycling, through which the variant could influence disease.

Helbig's lab at CHOP historically has focused on gene discovery in epilepsy. The American Journal of Human Genetics article was Helbig's first to combine gene discovery with clinical data. The new work is among the first studies to "rebuild" pediatric epilepsy patient histories over time to look for new phenotypic insights in longitudinal records.

"The era of EMR genomics has begun," Helbig proclaimed.

The longitudinal dataset of 658 individuals in the Genetics in Medicine paper contains records of 62,104 patient encounters, representing 3,251 patient-observation years and covering mutations to genes including SCN1A, SCN2A, and STXBP1. "This is a relatively large dataset that we can now use to come to conclusions about how certain epilepsies actually present over time," Helbig said.

He said that this is almost like a genome-wide association study because the researchers looked for significant correlations between known genetic patterns and phenotypes, such as the clinical expression of Dravet syndrome, a rare, severe childhood epilepsy associated with a variant in the SCN1A gene. The alleged misclassification of this variant in the death of a two-year-old is at the heart of a long-running lawsuit against Quest Diagnostics.

"Many of these conditions have known patterns," Helbig noted. "The first question we asked was: Do we see the patterns that we would expect when we transform our phenotypic data into this computational format?"

The researchers looked for how patients present as well as disease history in hopes of uncovering disease "footprints," or knowledge about the progression of rare epilepsies. The CHOP bioinformatics team had to standardize and harmonize phenotypes by mapping each mutation to Human Phenotype Ontology terminology.

"You're looking at two years of work, how we conceptualize the translation of phenotypes onto a standardized format, the Human Phenotype Ontology, and then mapping this on certain time increments so we can compare two patients at any point of time," Helbig explained.

This study represented the fruit of that labor. Now, the researchers can, for example, drill down on the individuals in the cohort who have burst suppression associated with the SCN2A gene, a severe expression of epileptic encephalopathy. "We actually replicate natural history or associated features with these conditions through this automated analysis," Helbig said, a process that would be next to impossible with manual chart review.

"The beauty of this approach is that this is scalable," Helbig said. "We look at natural histories in a very large cohort and we have shown here that we can replicate the patterns that we would typically expect these conditions to have."

CHOP has Epic Systems electronic medical records, but does not yet use the genomics module that Epic made available to its customers in the first half of 2019. Helbig said that CHOP instead merges deidentified patient data exported from Epic into its genomics pipeline.

The hospital made this decision for several reasons, including the facts that its phenotype matching predates the Epic genomics module and that the EMR add-on is designed more for clinical practice than research.

"The functionality of medical records by themselves to analyze data is quite limited," Helbig said. "And since we're a bioinformatics group, we use our own data processing pipelines for genomic data."

CHOP does have a long-term goal of having this predictive capability inform clinical decision support at the point of care. At the moment, the technology is purely for research, but Helbig's laboratory is starting to look at genotypic-phenotypic patterns that might be able to predict diagnoses, leading to early detection of rare ailments.

"We're transforming clinical data into a format that we can use for calculations, that we can use for informatics approaches, that we can use in the future for machine learning approaches," Helbig explained.

"Eventually, we want to do this to improve patient care. We would like to use the same rules to apply to medication response and to use this as a framework to slowly but surely put this together into a thought-out methodological framework to improve the care of the children who will get epilepsy," he continued.

Helbig believes that this technology is transferrable to other institutions because it relies on the publicly available HPO. "This is a generalizable concept on how medical record data at any institution or with any kind of global framework can be used to reconstruct previous histories," he said.

"We didn't want to create anything that only works at our institution. We wanted to create a generalizable framework on how this information can be used," Helbig said. "What we have done here is taken that first step into this era of EMR genomics, where we can meaningfully combine genetic information that we know how to handle with electronic medical record data, which is new to us."