CHARLOTTE, NC (GenomeWeb) – A Stanford University-led team has developed a computational strategy for systematically scouring the primary literature to speed up and simplify monogenic disease diagnoses in individuals who have had their exomes sequenced.
The method — called "Automatic Mendelian Literature Evaluation," or AMELIE — automates the often-tedious and burdensome literature searches required to link candidate gene culprits to individual Mendelian disease cases, Stanford computer science, developmental biology, pediatric medical genetics researcher Gill Bejerano explained during his presentation at the American College of Medical Genetics and Genomics annual meeting here yesterday.
During a platform presentation on molecular genomics and exome sequencing, Bejerano provided an overview of AMELIE and presented findings from an analysis of more than 200 singleton Mendelian disease cases.
The prevalence of monogenic diseases, which are believed to affect more than 5 percent of the population globally, combined with the tide of new genomes and exomes that are likely to be sequencing in the coming years, raises significant analytical challenges, Bejerano noted.
When also considering the ongoing re-analysis that needs to be done on sequenced, undiagnosed cases as new variant and gene associations are uncovered, the time, price, and labor that will go into monogenic disease diagnoses from exome or genome sequence data is significant, he added.
Bejerano argued that computers — using artificial intelligence, training by experts, and human verification — have the memory and speed to support such diagnoses through automated and intensive literature searches that would make even the most self-sacrificing graduate students or postdocs blanch.
To that end, he and his colleagues trained AMELIE with tens of millions of PubMed abstracts, representing abstracts for studies with or without a monogenic disease focus, before setting it loose on thousands of full-text articles, narrowing in causal variants from candidate genes found in individuals' exome sequences.
When they used AMELIE to assess candidate genes found in sequences for 215 individuals already diagnosed with monogenic conditions through the Deciphering Developmental Diseases study, the researchers found that AMELIE successfully placed authentic gene culprits at the top of the causal gene list for roughly 60 percent of cases. And some 90 percent of causal genes fell into the top five genes classified by AMELIE, Bejerano said.
"AMELIE parses hundreds of thousands of full-text articles to find an underlying diagnosis to explain a patient's phenotypes given the patient's exome," he and his colleagues from Stanford, UC Santa Cruz, and Cardiff University wrote in a related preprint paper appearing in bioRxiv last year, noting that the tool "prioritizes patient candidate genes for their likelihood of causing the patient's phenotypes."
For the 215 DDD cases, the AMELIE-assisted approach for prioritizing Mendelian candidate genes "significantly outperforms existing gene ranking methods," they wrote.
During his presentation here yesterday, Bejerano said the tool is freely available for non-commercial use. As for that acronym? "She'll change your life," Bejerano quipped, borrowing from the tagline for the French movie that shares the tool's name.