Skip to main content
Premium Trial:

Request an Annual Quote

NIH Team Looks for Best Ways to Reanalyze Exome Data


NEW YORK (GenomeWeb) – While clinical exome sequencing has become a useful tool to diagnose rare, unknown diseases, the majority of patients who receive the test still do not get an answer.

As part of the National Institutes of Health's Undiagnosed Disease Program, researchers have sought to identify analysis methods that could identify causative variants in patients with undiagnosed disease, including those who have already received a clinical exome test.

David Adams, deputy director of clinical genomics at the NIH's National Human Genome Research Institute, told GenomeWeb that the group has been "trying to figure out in what places data may be missing from clinical exomes." In a study published last month in BMC Medical Research, the researchers re-evaluated exome data from 54 cases that had been generated as part of the UDP, but had come back negative.

The researchers evaluated several methods for extending the analysis of exome sequencing data to determine which ones yielded the greatest benefit. For instance, in cases where only the proband had clinical exome sequencing, the researchers evaluated the impact of extending the analysis to parents and siblings. In addition, they evaluated tools to identify copy number variants, looked at the impact of extending the analysis from the exome to the whole genome, and looked at the impact of reduced coverage.

Adams said that the study could help inform best practices for conducting research-based exome analysis. While clinical exome sequencing rightly focuses on trying to identify known disease causing variants, Adams said that it is important to go beyond that for research purposes to enable both new discoveries or to identify known variants that are difficult to detect in a tradition clinical exome. In addition, "the whole idea of a research analysis of an exome has not been standardized by any means," he said. Even the laboratories that are part of the NIH's Undiagnosed Disease Network have slightly different methods for conducting research-based exome analysis, he said.

This study could "help make decisions about which additional analyses to do after a clinical analysis has proven negative." Because the additional analyses all add significant cost, time, and labor, he said it is important to figure out which techniques yield the most value in terms of identifying causal variants. For instance, he said, in the study the researchers found that a method to identify medium-sized deletions could be an effective add-on following a negative clinical exome and Adams said that the group is now looking at how to incorporate that analysis into its standard pipeline for research.

Identifying medium-sized copy number variants is important because exome sequencing protocols can typically only pick out small indels, Adams said. The group also runs SNP chips to capture the large structural variants, but the methods still miss those that are greater than around 50 bases in size, but smaller than whole chromosomes, Adams said.

To identify these structural variants from the exome data, the researchers used a program called Pindel, which takes the reads that do not map to the reference and analyzes them for evidence of structural variants.

Using Pindel to analyze a cohort of 54 probands and their parents, the researchers identified on average 33,000 structural variants per proband. They  then filtered these variants, excluding those present at high frequency in the population, possessed by the parents, or that overlapped with the variants called by either the exome analysis or SNP chip. That narrowed down the list of potential variants to an average of 1,200 per proband.

Including additional family members beyond the proband also "adds significant power" and "helps reduce noise," Adams said. The UDP is already working to incorporate additional family members, which tends to be relatively easy when the patient is a child, since parents and any siblings tend to be easy to get a hold of, but is more difficult for adult patients when the family members may live far apart from each other.

The researchers looked at the effects of incorporating family members in the analysis by studying 45 families that included the proband, both unaffected parents, and at least two additional siblings. They found that doing a trio versus just the proband alone made the most difference, reducing the potential variants called for further analysis from 1,126 to 117. Including additional members further reduced the number of variants, although to a lesser degree, to 88, 69, and 54 for quartets, quintets, and sextets, respectively.

Adams said that as the researchers look to include more structural variant analysis for the patients enrolled in the UDP, they would likely try to make use of the existing exome or whole-genome data, rather than run a separate test specific for structural variants. Aside from Pindel, he said the group is evaluating other tools that analyze exome data for structural variants.

In addition, he said, the UDP is already conducting whole-genome sequencing for some patients. Patients that are enrolled into the UDP either have exome sequencing performed by Baylor Medical College or whole-genome sequencing conducted by the HudsonAlpha Institute for Biotechnology or Illumina. Whole-genome sequencing has the advantage in that it can detect some structural variants, Adams said, although because overall coverage is less, it is a "bit less sensitive" for detecting structural variants.

In the BMC Medical Research study, the researchers also looked at the impact of doing whole-genome sequencing instead of exome sequencing. Adams said that he thinks whole-genome sequencing will eventually replace exome sequencing because it does enable the detection of variants in non-coding regions, but for clinical purposes, there is already such a large infrastructure around exome sequencing and currently much of the additional information that whole-genome sequencing can add is not well understood.

Other groups are also looking at methods to boost the diagnostic rate of exome sequencing, including re-analyzing data and using matchmaking tools to find cases. In addition, Personalis, as well as some academic groups, such as Emory and Baylor, have looked to improve on clinical exome sequencing by adding in coverage of medically relevant regions of the exome that may be missed by off-the-shelf capture kits.

Adams said that these so-called "medical exomes" have been "quite successful." Those approaches, however, are still focused on a clinical analysis, while the UDP group wants to figure out additional techniques to extend the analysis for research when a clinical test turns up negative. "There's a growing distinction between clinical analysis versus research analysis," Adams said. Clinical tests will not add analyses that take away from the efficiency of the test or increase the number of false positives, but for research, "that's reasonable to do when the clinical analysis doesn't find an answer," Adams said.