NEW YORK (GenomeWeb) — A team from the Ontario Institute for Cancer Research has published a study demonstrating that a two-dimensional sample pooling strategy can allow rare variants in pooled sequencing data to be traced back to their sample of origin.
In this way, researchers can combine samples to increase the efficiency and decrease the cost of sequencing without losing the ability to match any discovered rare variants back to an individual patient's sample.
The method, described in a paper in PLOS One this month, involves laying out DNA samples in 12-by-12 grids and then pooling the DNA by both row and column resulting in a collection of pools each containing DNA from 12 individuals.
Using this strategy, DNA from each individual is present in only one row-pool and one column-pool at a time. Thus, researchers can trace rare variants back to individual patient samples based on their location in this matrix.
John McPherson, the study's senior author, and Philip Zuzarte, the study's first author, told In Sequence that 2D pooling isn't itself a new technique, but applying it to the task of identifying rare variants was something they became interested in as a way to reduce the cost of sequencing a target region over many DNA samples.
Whereas more complex multidimensional pooling strategies — such as the "DNA Sudoku" developed originally by Cold Spring Harbor scientists — have allowed researchers to look at much larger sample sets and attribute the specific source of all observed variants, the OICR team's simpler grid method is specifically aimed at rare variant detection.
"When you sequence and you see a variant in one row and also in one column you can trace that to where they intersect, and that is the sample where that variant occurred," Zuzarte explained. "This only works for rare variants. For any common variant that [occurs in] multiple people in the pool you'll get multiple hits and you won't be able to deconvolute that."
In the PLOS One study, the OICR team used the 2D pooling strategy to sequence a 250 KB region — previously identified by GWAS as being linked to lung cancer — in 576 individual samples pooled into four 12-by-12 grids, resulting in 96 total pools.
Though the researchers did not share a direct cost comparison of their 2D pooling relative to other barcoding or straight sequencing methods, Zuzarte said that adopting the strategy "really, started out as a budgetary consideration [for the group.]"
"We had the GWAS region and all these individuals, so we came up with this strategy to cut down the cost of capturing and library making … and [this is what] made it possible to do the project at all in the number of individuals [we] wanted to do," he said.
After pooling, barcoding the pools, and sequencing using an Illumina HiSeq, the group analyzed the data and classified resulting variant calls as either "pinnable" meaning they were present in only one row and only one column and thus could be attributed to one specific individual, "multiple" meaning they occurred in more than one row and column, or "singleton" and occurring in either a row or a column, but not both.
The group identified 1,260 pinnable variants that could be assigned to an individual DNA sample among the 576 analyzed, and another 576 multiple variants, the authors reported.
To test their accuracy in identifying rare variants by their position in this pooling matrix, the researchers then verified a subset of the pinnable, multiple, and singleton SNV candidates using amplicon sequencing on the Ion Torrent PGM.
Overall, the team reported that about 91 percent of the pinnable calls were true positives. About 80 percent of multiple calls could also be verified as real variants. Meanwhile, only one of the 12 singletons the group verified was actually a true positive variant.
According to Zuzarte, the presence of these singleton false-positives highlights an additional benefit of the pooling method. Because each sample is sequenced twice — once in a row-pool and once in a column-pool — it increases the accuracy of variant calling since false positives can be ruled out by their singleton status.
The group also looked at a small number of indel variants discovered by the pooling approach, and verified that nine out of 10 pinnable indels selected at random were true positives — an accuracy rate similar to that of the pinnable SNVs.
"This really was tailored for our use … and we were able to do it very efficiently and rapidly and get the answer we wanted," McPherson said.
According to Zuzarte, the team doesn't currently have any other target GWAS regions it intends to sequence using 2D pooling, but he said that if he and his colleagues find themselves with another similar project they would definitely take the same approach.
Zuzarte explained that the utility of 2D pooling depends on the particular study. With smaller groups of subjects, sequencing individual samples might be affordable and easy enough that a more complicated pooling strategy isn’t necessary. "Depending on how many subjects you have there is a tipping point where it makes sense to do this," he said.
In publishing a description of the method and its application in this lung cancer study, Zuzarte and McPherson said they hope to encourage other labs looking to reduce the cost and complexity of resequencing GWAS peaks to try 2D pooling as well.