This story has been updated to include comments from Emory's Madhuri Hegde and UT Southwestern's Jason Park.
NEW YORK (GenomeWeb) – Although diagnostic exome sequencing is becoming an increasingly common method for trying to solve rare, undiagnosed diseases, most exome capture kits cannot completely capture the entire exome and may miss or poorly cover medically relevant genes.
Personalis, seeking to get around this problem, launched an augmented exome, called Accuracy and Content Enhanced, or ACE, Clinical Exome, in 2013. The test boosts coverage genes that contain medically relevant variants or that have characteristics that make them notoriously difficult to sequence, like high GC content.
In a study published this week in Genome Medicine, the company compared its test to commercial exome kits from Agilent, Nimblegen, and Illumina, as well as to whole-genome sequencing. It found that its test had improved coverage in "medically interpretable genes," the set of 56 genes for which the American College of Medical Genetics and Genomics suggests returning secondary findings, and a subset of known disease-associated variants.
"This is the first comparison of an augmented exome sequencing approach to other recent exomes and whole genome, and evaluation of their ability to reach a 'clinical' standard of coverage of key medically important genes," Richard Chen, chief scientific officer of Personalis, told GenomeWeb. The study illustrates "key gains for our ACE augmented exome approach in finishing" genes and exons so that variants are not missed, he added.
Madhuri Hegde, executive director of Emory's Genetics Laboratory, who was not involved in the study, told GenomeWeb that the study "clearly shows that standard exome designs are not sufficient for clinical use" and that "a dedicated effort to improve the exome design for clinical use can lead to a significantly higher sensitivity and specificity."
Personalis uses a proprietary sample prep, enrichment, and sequencing protocol in its ACE Clinical Exome test to supplement targeting in 8,020 genes with known or potential clinical impact.
Personalis compared its strategy with Agilent's SureSelect Human version 5 and SureSelect Clinical Research Exome, NimbleGen SeqCap EZ Human Exome Library v3.0, and lllumina's Nextera Rapid Capture Exome, as well as whole-genome sequencing using Illumina's TruSeq PCR-free protocol.
As its test sample, Personalis used a well-characterized cell-line reference sample, NA12878.
The team first compared coverage performance in what it described as the "medically interpretable genome" — a list of 5,419 genes in which mutations are known to cause disease or disease-related drug response. The list included genes that are part of existing clinical tests, documented as pharmacogenes, or have a known causal association with a Mendelian disease, inherited disease, or cancer.
To compare the different technologies, after sequencing, the researchers normalized each set of sequence data to 12 gb and 100X average coverage. When all platforms were normalized to 100X mean-target coverage, the mean coverage depth of the medically interpretable genome was highest for Agilent's clinical research exome at 207x while ACE came in second at 138x.
Nevertheless, ACE covered more than 99 percent of bases in protein-coding regions and more than 93 percent of bases in non-coding regions at 20X or greater coverage compared to the other exome sequencing platforms, which covered between 93 percent and 97 percent of protein-coding bases and between 50 percent and 73 percent of noncoding bases at 20X or greater coverage. Lower coverage in noncoding regions was expected for the commercial kits, since those regions are not included in the target design of the exome capture kits.
Whole-genome sequencing captured 97 percent of protein-coding bases and 95 percent of non-coding bases at 20X or greater coverage.
The researchers also found that ACE "finished" more genes than the other exome capture kits — covering 100 percent of bases at 20X or greater coverage in around 90 percent of the genes in the medically interpretable genome. Whole-genome sequencing finished 10 percent of the genes, while the exome kits finished between 30 percent and 65 percent.
Jason Park, director of the Advanced Diagnostics Laboratory at Children's Medical Center in Dallas, told GenomeWeb that the concept of a finished gene was an "important concept" and "especially in the clinical realm, this should be part of the test validation criteria and the description of the test offered."
The Personalis team looked at performance in the 56 ACMG genes and found that no platform was perfect. At the predefined criteria of covering all of each gene's bases at 20X or greater coverage, ACE met that for 51 genes, Agilent's clinical research exome for 39, Illumina's Nextera for 36 genes, Agilent's SureSelect for 15 genes, NimbleGen's platform for 12 genes, and whole-genome sequencing met the criteria for just two genes.
Looking specifically at over 2,134 disease-associated SNVs in the 56 ACMG genes, the platforms' performance varied widely. When normalized to 12 gb of sequence data, Personalis' ACE had adequate coverage for 2,105 SNVs, followed by 2,022 for Agilent's clinical research exome. Whole-genome sequencing covered 1,240 of the disease-associated SNVs adequately, while NimbleGen covered just 1,193.
Park noted, however, that for whole-genome sequencing, coverage does not typically need to be as high as for exome or targeted sequencing — 10X to 15X coverage is usually sufficient for whole-genome sequencing.
The impact of improved coverage in medically important areas resulted in improvements in sensitivity in ACE over the other methods. The sensitivity of ACE for SNVs in the defined medically interpretable genome was 98.9 percent, slightly higher than the 98.5 percent with Agilent's clinical research exome. For indels, sensitivity for the two methods was equal at 94.4 percent.
Looking at accuracy in high GC-rich regions, ACE performed significantly better than the other platforms with SNV and indel sensitivity at 97 percent and 94.7 percent, respectively. The next best platform was Agilent's clinical research exome with 94.4 percent sensitivity and 89.5 percent sensitivity for SNVs and indels, respectively. Illumina's Nextera, however, had 100 percent sensitivity for indels, but sensitivity for SNVs was only 87.1 percent.
In the study, the authors highlight two clinically important genes — RPGR, in which over 300 mutations are associated with retinitis pigmentosa; and CFTR, in which over 1,000 mutations are associated with cystic fibrosis. The RPGR gene includes a region of high GC content. In the study, the conventional exome capture kits covered between 71 percent and 87 percent of coding bases in RPGR, compared to between 100 percent of bases for ACE and 88 percent for whole-genome sequencing. And, while the exome kits captured between 90 percent and 99 percent of coding mutations in the CFTR gene, there is a noncoding mutation, which is recommended for carrier screening that all platforms, except for whole-genome sequencing and ACE, failed to cover adequately.
In fact, said Chen, "We have had clinical cases where a causative variant was in the RPGR gap region that was found with our augmented exome but would have been missed with other exomes."
"The variation in coverage and accuracy among platforms highlights the need for clinicians to consider analytical performance when making clinical assessments, given the risk of overinterpreting negative results," the authors concluded in the study.
Other groups have also noted that exome sequencing can miss pathogenic variants. Researchers from the University of Texas Southwestern Medical Center and Thomas Jefferson University reported last year that in an analysis of 57 exome datasets, more than 50 percent of HGMD variants in seven ACMG genes had inadequate coverage.
Some clinical laboratories offering diagnostic exome sequencing tests have also realized the shortcomings of conventional capture kits and have taken their own steps to get around the problem. Emory Genetics Laboratory, the Children's Hospital of Philadelphia, and Harvard's Laboratory of Molecular Medicine have been collaborating on the Medical Exome Project, an effort to curate information on medically relevant genes and design a better method for targeting those genes. Based on that project, Emory is now offering its Medical EmExome, which bolsters coverage in around 4,600 genes.
Baylor College of Medicine is also offering its own version of a medical exome with enhanced coverage in over 3,600 genes.
Emory's Hegde said that such efforts will be a "significant advantage in the clinical setting." In addition, she said, "an enhanced exome design also alleviates the pressure from the labs to do genome sequencing, which though is getting cheaper, the cost to store and time to analyze data can be enormous without a large increase in clinical yield."
Park said that it will be interesting to see as whole-genome sequencing costs drop, particularly as laboratories with Illumina's HiSeq X Ten's systems get them up and running, whether it will end up being more cost-effective to switch to whole-genome sequencing. "Sequencing costs are dropping faster than oligonucleotide synthesis," he said. "A key problem for innovation with exome methods is that they exist in a narrow window for pricing which is capped by the dropping price of whole genome sequencing."
Chen said that Personalis' customers now include healthcare systems, universities, clinicians, and governments in 12 different countries. He said that Personalis plans to continue to focus on improving both the accuracy of its assays, as well as interpretation. "Our focus will continue to be in cancer and Mendelian clinical testing and research applications," he said.
In addition, Chen said that US Food and Drug Administration regulation is "important and necessary given the uneven quality of NGS testing currently."
One effort that Personalis supports is the National Institute of Standards and Technology's Genome in a Bottle Consortium, which has been working to develop reference material for human genome sequencing so that labs have a metric by which to compare the performance of their own tests. In May, NIST released the first set of DNA reference materials.
Having reference materials for sequencing is "vitally important so that platforms can be rigorously evaluated and compared," Chen added.