NEW YORK (GenomeWeb) – A large-scale analysis of almost 51,000 exomes of patients and their electronic health records by the Regeneron Genetics Center and the Geisinger Health System has revealed clinically actionable variants in 3.5 percent of individuals as well as a number of known and potential drug targets.
The study also demonstrated that exome sequencing can identify individuals at risk of familial hypercholesterolemia (FH), providing an example of the potential clinical benefit of planned similar large-scale sequencing studies, such as the Precision Medicine Initiative.
The data set from the so-called DiscovEHR study "provides a blueprint for large-scale precision medicine initiatives and genomics-guided therapy discovery," corresponding authors Frederick Dewey from Regeneron and David Carey from Geisinger and their colleagues wrote in a publication of the first study results in Science today. A separate Science paper, led by corresponding author Michael Murray from Geisinger, focuses on the analysis of individuals at risk of FH.
Regeneron and Geisinger announced their collaboration, under which they intend to sequence and analyze 100,000 Geisinger patients, in early 2014. Sequencing for the project takes place at the Regeneron Genetics Center, a wholly-owned subsidiary of Regeneron Pharmaceuticals.
For their study, the researchers sequenced the exons of 18,852 genes in 50,726 patients who had consented to participate in the Geisinger MyCode Community Health Initiative. Almost half of them had at least one first- or second-degree relative who also participated in the sequencing study, and the average age of the study subjects was 61. Participants had clinical phenotypes recorded in their EHR over a median of 14 years, with a median of 87 clinical visits, 658 laboratory tests, and seven procedures per individual.
The sequencing covered more than 85 percent of the target regions with at least 20X haploid read depth in the overwhelming majority of samples. The analysis identified a median of 21,409 single-nucleotide variants and 1,031 indels per person, of which 887 were novel. In total, the researchers uncovered more than 4 million unique SNVs and 224,100 unique indels, almost all of them rare, with allele frequencies of less than 1 percent.
Each participant had a median of 21 rare variants that are predicted to result in a loss of function, and, overall, study subjects had 176,000 such loss-of-function variants.
To assess their clinical impact, the researchers conducted gene-based burden tests of association between loss-of-function variants and 80 laboratory traits that were documented in the EHRs. This uncovered a number of previously unidentified associations, for example, between the gene CSF2RB and basophil and eosinophil counts.
They also performed an exome-wide association study between fasting lipid levels — cholesterol, HDL-C, LDL-C, and triglycerides, which are risk factors for heart disease and stroke — and rare variants, and found a number of associated variants at various loci, including APOC3 and APOB.
In addition, gene-based burden tests of association between loss-of-function variants and lipid levels uncovered novel rare alleles in known lipid-associated gene loci, including CD36, as well as a new association with G6PC, which has been implicated in glycogen storage disease type Ia in the past.
The researchers also looked for associations between lipid levels and loss-of-function variants in nine genes that encode drug targets for lipid modification. Carriers of such variants are 'human knock-outs' and predicted to mimic the effects of antagonistic drugs. Indeed, the researchers could validate the actions of several lipid-lowering drugs and anticipate their on-target side effects. The approach could "potentially reveal previously unidentified targets for therapeutic development," they wrote.
Finally, the team looked at the prevalence of clinically actionable variants in the exome data, considering 76 genes, including the 56 genes recommended by the American College of Medical Genetics and Genomics and 20 additional genes. In total, 13 percent of participants, or almost 7,000, harbored pathogenic or likely pathogenic variants in these genes. Of those patients, the researchers selected a pilot set of 1,415 for CLIA validation and review of their results. In total, they found 43 reportable variants in 49 individuals, or 3.5 percent of the pilot cohort.
Of those, 65 percent had clinical features in their EHR that were consistent with the predicted disease, but only 14 percent had a formal diagnosis, and another 53 percent had diagnoses consistent with clinical features of the disease. Thus, the results "demonstrate the potential for genomics-guided clinical care in a largely unselected clinical population," the authors wrote.
In their second publication, the researchers honed in on one particular genetic condition, familial hypercholesterolemia, which remains underdiagnosed and is associated with coronary artery disease and stroke. FH is mostly caused by mutations in the genes LDLR, APOB, and PCSK9.
Overall, they found 35 known and predicted pathogenic variants in those three genes in 229 individuals, translating to a disease prevalence of 1:256 in this cohort. Prior to genetic testing, only 15 percent of the FH variant carriers had a diagnosis for hypercholesterolemia or had been seen in a lipid clinic, and only 24 percent of carriers would have qualified for such a diagnosis based on their clinical EHR data. A total of 58 percent of FH variant carriers were active statin users, the researchers found, but fewer than half of them had sufficiently lowered LDL-C levels as a result, suggesting undertreatment.
The risk for coronary artery disease was significantly increased in FH variant carriers, in particular those with loss-of-function mutations in LDLR. However, as the researchers pointed out, there are currently no genotype-based guidelines for the treatment of hyperlipidemia.
The study results "demonstrate a potential clinical benefit for the large-scale sequencing planned by the national Precision Medicine Initiative," the authors wrote, noting that "as a highly modifiable genetic condition, FH is an ideal starting point for implementation of a return of results program."