Despite an increasing number of clinical laboratories now offering diagnostic exome sequencing, commercial exome kits still have a number of holes in them and miss important regions even in medically relevant genes.
To address this problem, researchers from Emory Genetics Laboratory, Children's Hospital of Philadelphia, and Harvard's Laboratory of Molecular Medicine are developing a custom-designed kit to ensure complete coverage of around 5,000 medically relevant genes, Clinical Sequencing News has learned.
The group, which includes Madhuri Hegde, executive director of Emory Genetics Laboratory; Avni Santani, scientific director of the Molecular Genetics Laboratory at CHOP; and Birgit Funke, director of clinical research and development at Harvard's Laboratory of Molecular Medicine, plans to make their gene list and the methods for the test design available this fall so that laboratories that are just beginning to offer diagnostic exome tests can opt to use the group's protocol. While they won't be marketing a test, the hope is that the vendor of whichever technology they ultimately decide on could make customized kits.
Currently, the group is in the process of working with various vendors to try to design a cost-effective, quick approach for boosting coverage of medically important genes. They recently settled on an initial gene list, which they are in the process of curating. The list includes known and suspected Mendelian disease genes, some cancer-related genes and gene fusions, genes that are strongly associated with common complex diseases, and pharmacogenetically important genes.
Exome Holes
Currently, clinical laboratories that offer diagnostic exome tests use off-the-shelf capture kits from vendors such as Agilent and Nimblegen, but these kits only cover up to around 92 percent of the exome, Hegde told CSN.
Additionally, even after sequencing that 92 percent, between 11 percent and 20 percent of the exons in the Human Gene Mutation Database are either poorly covered or not covered at all, and of those exons at least half are in medically relevant genes, she added.
"When reporting cases, it's highly possible that [there is] a negative report because the kit did not cover the necessary exons," Hegde said. "We really wanted to design something that is clinically relevant to be used in laboratories as a clinically validated test."
The poorly covered exons tend to be in regions that are GC-rich or have some other sequence complexity that make them difficult to capture.
The group has been testing Agilent and Nimblegen exome kits to figure out where the holes were and why the kits were missing specific exons. Some exons are missed completely because of the initial dataset that the kit is designed around. For instance, some companies design their kits based on the RefSeq dataset, while others prefer the CCDS dataset, CHOP's Santani told CSN.
"There are other regions that are targeted by baits but the baits are not efficient," she added. "And lastly, genomic architecture plays a significant role." For instance, designing baits for pseudogenes, repetitive regions, or GC-rich regions is "not easy to begin with," Santani said.
The first problem — having no baits to target the region in the first place — is an easy one to solve, Santani said. The other issues are trickier to overcome and Santani said the three researchers have been working with the vendors to test various strategies, such as designing multiple baits for certain regions, to help enhance coverage in the problematic regions.
The group is also testing alternative technologies, such as a biotinylated oligo approach by Integrated DNA Technologies or RainDance's amplicon sequencing approach, as well as other custom amplicon sequencing designs, she said.
"We're looking at cost, ease of use, performance will be very important, and time frame," Santani said. Based on these metrics, the team will decide on either one or potentially two approaches.
Another consideration will be how these alternative technologies integrate with the current kits. For instance, Santani said that Foundation Medicine has presented posters demonstrating that the oligo spike-in approach from IDT is compatible with Agilent's SureSelect technology.
If the group decides to go with a technology like RainDance, however, which uses custom amplicon designs, the final product would be "two complementary approaches that would have to work together," Santani said.
"What plays into this is cost and how easy this would be to bring into labs," she added. "We're trying to go for a simple, elegant approach rather than make it clunky and difficult to adopt."
Medically Relevant Genes
Aside from increasing coverage of important genes, the group is also removing genes that have no clinical relevance, Hegde said, in order to keep costs down. The final product will include enhanced coverage of the 5,000 or so genes the group has identified, plus additional genes as cost allows, she said.
"At the end of the day, the important thing is to define what these genes are," said Harvard's Funke. "I think this is our biggest concern — what are the genes that we have to worry about today?"
Additionally, she added, in conversations with physicians that order exome tests, she has found that many do not realize that the current exome kits do not capture every medically relevant exon.
Even highly informed physicians think that everything is tested for in an exome test, she said. Funke said her worry is that when physicians receive a negative report, they think they are "at the end of the road" and that there's nothing further to be done. But, "meanwhile, an obvious gene was only half-covered."
In order to choose the list of 5,000 medically relevant genes, the trio searched a wide variety of databases, including both the large well-known databases such as HGMD and OMIM, as well as copy number databases that keep track of genes with frequent copy number changes, and individual laboratories' own databases. "Any database that claimed to have useful information associated with genes, we downloaded," Funke said.
The team even mined the Jackson Laboratory's mouse genome database, since it contains many mouse models associated with human disease.
They then took that initial "data dump," and through extensive curation, narrowed it down to around 5,000 genes. The next step, said Hegde, is to "take that curated gene list and see what is not getting enhanced" by the current exome kits.
Additionally, they plan to open up the process to the community in order to add to or subtract from the gene list as necessary. "We fully expect that it will be evolving," Funke said.
By this fall the group plans to release the final list and their design approach for targeting those genes. While the group does not plan to commercialize a kit based off the design, Hegde anticipates that it will follow a model similar to microarrays developed by the International Standards for Cytogenomic Arrays Consortium, where the different vendors market specific ISCA arrays.
Hegde said that they expect that the gene list will evolve over time as more research is done, and genes will be either added or removed from the list. Despite the flaws in the current exome tests, she said that they were still valid tests. "We had to start somewhere," she said. "This is part of the improvement process," she said, and "we are now where we need to improve on it and make sure we have a test that's relevant for a clinical laboratory to use in a clinical setting."