NEW YORK (GenomeWeb) – A head-to-head comparison published in BMC Genomics has delineated some of the coverage and sequence bias differences associated with exome capture kits from NimbleGen, Agilent, and Illumina.
The kits included in the analysis represent the main solution-based capture methods available when the study began around two years ago, senior author Leonardo Meza-Zepeda, a researcher affiliated with Oslo University Hospital and the Norwegian Cancer Genomics Consortium, told In Sequence.
Though new and improved versions of the methods have made their way onto the market since then, he and his team are optimistic that the general patterns they detected will prove useful for helping others select appropriate exome capture approaches for the application at hand.
The researchers reported that all four kits performed well overall. Still, they described differences in the size of the sequence set targeted by each kit, the efficiency with which these sequences were captured, as well as the sequence-related biases associated with that capture.
As such, they noted that those designing exome sequencing experiments may want to consider the protein-coding sequences they're most interested in analyzing, along with the amount of input DNA available and the cost required to generate sufficient sequence depth across that target region.
Meza-Zepeda and his colleagues are currently doing their own exome sequencing experiments with a newer version of the Agilent SureSelect kit than that tested in the BMC Genomics analysis. In contrast to capture kits from the other two companies, the Agilent approach uses RNA rather than DNA to probe for protein-coding sequences and targets a somewhat smaller stretch of sequences.
Since the time of the team's analysis, updates have been made to exome capture kits from other companies as well, Meza-Zepeda noted, including tweaks to the precise sequences targeted and changes intended to decrease sequence biases.
In the group's comparison, for example, it found that the Illumina Nextera kit tended to favor sequences with high guanine and cytosine, or "GC," content — an issue that has reportedly been addressed in updated versions of that kit.
The comparison was done in preparation for a large-scale study by the Norwegian Cancer Genomics Consortium, Meza-Zepeda said, which is interested in evaluating genome-based cancer diagnostics and treatment targeting in Norway.
Members of the consortium recently undertook an exome sequencing effort focused on nine different cancer types, using paired samples from between 100 and 150 individuals for each cancer type.
Prior to starting that work, Meza-Zepeda and his colleagues decided to evaluate the various exome capture alternatives. When the comparison kicked off around two years ago, the primary kits on the market for capturing protein-coding portions of the genome were NimbleGen SeqCap EZ v3.0 kit, Agilent's SureSelect v4.0, and the TruSeq Exome and Nextera Exome kits, both from Illumina.
Using DNA from a human osteosarcoma tumor sample and the Illumina HiSeq 2000 instrument, the Norwegian researchers generated between almost 96 million reads and 185 million reads per sample, using the resulting data to compare the sequences targeted by each kit as well as their capture performance features.
While the RNA probe-based Agilent SureSelect v4.0 kit included in the study targeted 51.1 million bases of sequence, for example, the study's authors noted that the gapped DNA probe-based Illumina kits each aimed to capture 62.08 million bases of sequence.
The NimbleGen SeqCap EZ v3.0 kit, which uses an overlapping DNA probe design, targeted the largest swath of sequences, at 64.1 million bases.
When they looked closely at the genome sequences targeted by all four capture kits, the researchers saw a relatively small set — just 26.2 million bases — of overlapping sequences.
"That was actually one of the biggest surprises when we started comparing the kits," Meza-Zepeda said.
He noted that the team also saw differences in the portion of targeted sequence for each kit that overlapped with protein-coding sequence databases such as the Consensus Coding Sequence Database (CCDS), RefSeq, or Ensembl database.
While the protein-coding sequences should theoretically be the same regardless of the method used to nab them from the genome, different kits have been designed to correspond to slightly different interpretations of the human exome, he explained.
For example, the Ensembl database is home to a relatively broad set of suspected protein-coding sequences, while the exome sequences described in the RefSeq resource are somewhat more conservative.
And while Illumina capture kits targeted some 10 million to 12 million more bases than the SureSelect kit, Meza-Zepeda noted, the new analysis suggest that these approaches don't provide a huge boost in coverage of sequences described in the CCDS database, which the Norwegian team is especially interested in.
On the capture efficiency side, the Agilent kit targeted the fewest bases overall but topped the heap in terms of its ability to consistently capture the intended sequences.
The researchers reportedly captured some 99.8 percent of the sequences targeted when using the Agilent kit, compared with 98.2 percent capture using the NimbleGen approach.
The TruSeq and Nextera kits came in at 96.9 percent and 96.5 percent capture, respectively — slightly lower capture efficiencies that may be due to differences in probe design in the Illumina kits, Meza-Zepeda said.
In general, the researchers detected more SNPs and small insertions or deletions in exome sequences produced with kits targeting broader swathes of protein-coding sequences. Across the stretches of sequence targeted by all four kits, though, they picked up a comparable number of variants regardless of the approach used.
Even so, the Illumina approaches won out in terms of detecting variants in untranslated regions neighboring protein-coding genes. Between the two Illumina kits tested, the TruSeq approach provided more uniform coverage, the team reported.
That was partly due to an over-representation of sequences with high GC content in samples prepared with the Nextera kit, which uses so-called transposomes to fragment DNA during library prep instead of sonication. In contrast, the other kits tended to show slightly lower capture across regions with high or low GC content.
"If you look at the TruSeq data, what you see is less efficient capture for [sequences with] high GC content," Meza-Zepeda said. "But the Nextera, in a way, sort of over-corrects for high GC content."
Illumina has since released a newer version of the Nextera kit that included changes aimed at reducing that GC bias, he noted, though he and his colleagues have not yet tested that kit.
Meza-Zepeda said his team is gearing up to assess a transposable element-based version of Agilent's SureSelect kit known as QXT, which is billed as having improved library amplification and performance with fewer library construction steps.
Although the current comparison was done mainly using osteosarcoma tumor samples, the study's authors noted that the overall performance patterns described for the four capture kits seemed to hold across both mutation-prone and normal sequences in their follow-up experiments.
"We could not find drastic differences in coverage in normal regions and coverage in cancer genes," the study's first author Chandra Sekhar Reddy Chilamakuri, an Oslo University Hospital bioinformatician and Norwegian Cancer Genomics Consortium member, told IS.
Even so, because cancer samples are generally comprised of a mixed population of cells, the researchers noted that a sufficient depth of coverage is needed in order to call somatic variants in tumor samples.
Members of the Norwegian Cancer Genomics Consortium are generating 250-fold coverage of each sequenced exome, on average, Meza-Zepeda noted, which means that capture kits targeting sequences beyond the scope of their analysis can quickly ratchet up sequencing costs.
By getting a picture of the sequences captured by each kit, he explained, it's possible to start making decisions about whether such additional cost is warranted. Likewise, results of such direct comparisons may offer clues about the capture kit that's best suited to other applications.
"Our study should help researchers who are planning exome sequencing experiments select the most appropriate technology for their study," he and his co-authors concluded, "without having to perform expensive and time-consuming comparisons."