Skip to main content
Premium Trial:

Request an Annual Quote

Pooling and WGA Method Enables Accurate Calling of SNVs from Cancer Samples with Limited DNA


Researchers from Uppsala University in Sweden have developed a targeted sequencing method that relies on whole-genome amplification and pooling of samples prior to capture.

The method, which was published in BMC Genomics this month, is suited not only to cancer samples, but any clinical sample where DNA is limited, and is ideal for analyzing large numbers of samples because the pooling saves on reagent cost and time.

"We work with clinical samples and the amount of DNA we have available is often very small," Eva Berglund, lead author of the study and a researcher in Uppsala's department of medical sciences, told In Sequence. "So we wanted to check if the whole-genome amplification procedure affects the results of the allele fractions." Additionally, she said, "if we can pool samples in the same enrichment reaction, we can save time and money."

In the BMC Genomics study, the researchers pooled up to 10 samples before enriching for the target region. That allows us "to do only one capture experiment as opposed to doing 10," Berglund said. "So it's one-tenth the cost for the capture reagents and we also spend less time preparing the libraries." However, she said, pooling also adds more time at the end to the analysis.

The team first sequenced the whole genomes of two acute lymphoblastic leukemia samples in order to select the target region and SNVs for evaluation. From the two whole genomes, the team selected 1,541 putative SNVs — 749 from one patient and 794 from the second patient, two of which overlapped. Thirty of those had been previously validated as somatic SNVs. They also selected 20 germline SNPs that were heterozygous in both patients. For each variant, a three-base target region that included one base upstream and downstream was defined. Additionally, the researchers included the exons of 37 genes and five custom regions ranging from 33 bp to 263 bp for a total number of 2,431 target regions that spanned 147 kilobases.

The target region was custom ordered from Agilent, using the firm's HaloPlex technology. Berglund said the team chose to use Agilent's HaloPlex technology because it requires small amounts of input DNA, which is important for clinical samples. Additionally, the technology has high specificity and it incorporates sequencing adapters during the enrichment, saving time, she said.

The final design had a total size of 798 kb, covering 1,528 variants.

The regions that were not covered by the HaloPlex design were due to adjacency to repetitive regions, a lack of restriction fragments of appropriate size, or fragments that were too large relative to the read length, the authors wrote.

The researchers tested the design on the two ALL samples whose whole genomes had been sequenced, including both tumor and normal DNA as well as genomic and whole-genome amplified DNA. They then tested the design on non-indexed pools of samples containing two, five, or 10 samples.

The average sequence depth ranged from 792 to 1,752 in the region covered by the HaloPlex design, and from 1,008 to 2,254 at the 1,528 variants covered by the capture design. Between 91.6 and 97.4 percent of the variants were covered at least 30x.

To analyze accuracy, the researchers used the 19 heterozygous germline SNPs, which were expected to have an allele fraction of 0.5 in individual samples in both cancer and normal cells. For these samples, the researchers found that the actual allele fraction in the HaloPlex designed deviated by only an average of 0.064 from the expected result.

Next, the team set criteria for calling putative variants from the two ALL samples as somatic SNVs or not. They required that the variant be covered at least 30-fold in both tumor and normal samples and that the variant have an allele frequency of greater than 0.1 in the tumor sample and less than 0.01 in the normal sample. Around one-third of the putative variants from the whole-genome sequence data were classified as somatic SNVs in the HaloPlex design, 227 and 305 SNVs from each sample. All of the 30 previously validated SNVs were confirmed.

In order to evaluate the impact of whole-genome amplification on allele fractions, the researchers compared the results from the genomic DNA and the whole-genome amplified DNA from the two ALL samples. They found that while whole-genome amplification did not affect capture specificity, coverage was less even, with more sites having relatively low or relatively high coverage in the whole-genome amplified samples. However, coverage was not affected by input DNA, which ranged from 200 ng to 1,000 ng. The number of putative variants classified as somatic was also concordant between the two samples.

Similarly, when the researchers evaluated the effect of pooling on allele fractions, they found that the observed allele fractions were concordant with what was expected. Additionally, they found that they were able to detect somatic SNVs in pools of up to 10 samples. Undetected SNVs were typically caused by low sequence coverage.

Finally, the team tested the design's ability to call novel variants in each of the pooled samples. To make sure that germline SNPs were not called and to filter out false positives, they designed criteria for SNV calling, setting the expected allele fraction for a somatic SNV present in a single sample in a pool to 0.5 divided by the number of samples included in the pool. Only variants with an allele fraction between half and twice the expected value were selected.

They identified six high-confidence candidate somatic SNVs that they verified with Sanger sequencing. One additional SNV was called in one of the pools but not in its replicate and further inspection found that it was a false positive call caused by alignment artifacts. The overall false positive discovery rate was below 6 percent.

"Pooling of DNA samples before capture thus allows accurate SNV detection in many samples at low reagent cost," the authors wrote. However, this comes "at the expense of losing the information in which sample novel variants are detected, unless experimental validation is performed."

Additionally, the results "show that analysis of a large number of samples, including samples where limited amounts of DNA have previously been prohibitive, is possible at low cost."

Berglund said that the team "was happy with the results of the evaluation" and is now "working on a larger study" with many more samples.

Additionally, she said that the researchers plan to evaluate a protocol that uses overlapping pools, where each sample would be present in two pools. This would enable variants to be matched with the specific sample. For instance she said, sample 1 could be present in pool 1 and pool 2, but those pools would have no other samples in common. Then, if the same variant is identified in pools 1 and 2, it likely is present in sample 1. Such a strategy would only work for rare variants, such as somatic mutations in cancer that are not expected to occur in many samples, she said.