NEW YORK (GenomeWeb) – Researchers at Stanford University School of Medicine have developed a machine-learning framework that integrates genomic and electronic health record data to predict an individual's risk of developing abdominal aortic aneurysm (AAA) and to improve scientists' understanding of the biological architecture of the disease.
The analytical framework, called HEAL (hierarchical estimate from agnostic learning), could be used as an early screening test for AAA and as a personal health management tool to help lower disease risk through lifestyle adjustments. In addition, it may be applicable to other complex diseases.
A study describing HEAL and its application to AAA, led by Michael Snyder of the Center for Genomics and Personalized Medicine at Stanford and Philip Tsao of the VA Palo Alto Health Care System, was published in Cell today. "As envisioned for precision medicine, for the first time we provide proof of principle for a general analytical framework that has simultaneously achieved clinical prognosis and disease gene identification from personal genomes," the authors wrote.
Snyder told GenomeWeb that his team is planning to commercialize HEAL but is still discussing how to do so. "I expect the test to be widely used," he said in an email.
HEAL could also be useful to predict risk for other diseases. "The HEAL framework is potentially applicable to many complex diseases, and our overall approach is expected to be valuable in developing clinical tests that incorporate personal genome sequences into disease-risk prediction," the researchers noted, and Snyder's group is currently applying HEAL to genomic studies of preterm birth and autism.
AAA leads to an irreversible dilation of the abdominal aorta and affects more than 5 percent of individuals over the age of 65. It is usually asymptomatic and diagnosed at a late stage. Its most common complication is rupture of the aorta, which is lethal in 90 percent of cases, making AAA the tenth most common cause of death in western countries. While AAA is estimated to be 70 percent hereditary, not much is known about the genetics of the disease, which is mutationally heterogeneous.
To characterize the mutational landscape of AAA, the Stanford team sequenced the genomes of AAA patients and controls and used HEAL to correlate genomic variants with disease traits.
They found that HEAL was able to predict disease status from genome data alone, as well as from lifestyle and physiological data alone, but the predictive power increased when the two were combined.
For their study, they selected 313 AAA patients and 161 controls from the VA Palo Alto Healthcare System, Stanford University, and Kaiser Permanente and performed whole-genome sequencing to an average coverage of 50X on blood samples from the study subjects.
They then performed a genome-wide association study on the data, which looks for disease associations with common variants, but not a single genomic locus reached statistical significance. Instead, they focused on rare variants, analyzing them with HEAL.
HEAL starts by annotating each variant. It then agnostically identifies a subset of genes with distinct mutational patterns in cases compared to controls, which can be used to predict clinical outcomes. To learn more about the disease etiology, it also maps the genes onto biological networks. From the study data, HEAL identified a minimal set of 60 genes whose mutational burden was increased in cases versus controls. These genes were able to predict AAA status pretty well, with an AUROC (area under receiver operating characteristic curve) of 0.69.
Next, the researchers analyzed electronic health record (EHR) data from each study participant, including lifestyle surveys and physiological measurements taken during their last clinical visit, such as sex, age, smoking status, heart rate, waist-to-hip ratio, insulin level, fasting glucose level, and lipid profiles. Using a similar machine-learning model, they were able to distinguish between AAA patients and controls with an AUROC of 0.775. This was expected, they noted, because AAA is known to be associated with these traits.
When they integrated the genomic data with the EHR records, the predictive power of their model increased further, to an AUROC of 0.8, "demonstrating the complementarity of personal genomes and individual lifestyles in predicting disease outcomes."
"Taken together, the genome-based model in HEAL identifies the genome baseline for an individual to develop AAA, and the combined genomics and EHR model in HEAL more accurately predicted the risk," they concluded.
The genome-based model had a similar or lower false-negative rate than the EHR model, they found, along with a higher false-positive rate. Still, the model might be useful as an early assessment tool, which "is lacking and strongly desired in clinical practice," the researchers wrote. "Given its low false-negative and relatively higher false-positive rate, the genome-based model has the potential to be deployed as an early screening tool for AAA, and the false-positive calls can be easily complemented by the inexpensive and noninvasive ultrasound follow-ups."
HEAL could also be used as a personalized health management tool by integrating the genomic and lifestyle or physiological measurements, they wrote. For example, a patient's AAA risk could be lowered by decreasing his or her plasma high-density lipoprotein levels, which could have more or less of an effect depending on their genomic profile.
Besides predicting AAA risk, HEAL was able to elucidate the molecular basis of AAA. The 60 genes identified by the framework were enriched in immune-related functions, such as interferon-gamma-mediated signaling, MHC class II receptor activity, and T cell co-stimulation. Mapping them onto protein-protein interaction network data also revealed that they are involved in several biological pathways and fall into 40 functional modules. Additional studies in human tissues and mouse models demonstrated the involvement of these modules in AAA.
The researchers noted that their analysis only considered single-nucleotide variants but not indels, copy number variants, or non-coding variants. Including those in the future, the predictive power of HEAL could likely be increased.