Researchers at Cold Spring Harbor Laboratory have devised a pooling-based multiplexing method that allows them to sequence tens of thousands of samples in a single second-generation sequencing run, many more than can be done by existing barcoding methods.
Though the group is not the only one that has developed a pooling method for multiplexed second-gen sequencing — a Columbia University team has developed a related approach — it may be the first to explore its potential for clinical applications, by identifying carriers of genetic diseases in certain orthodox Jewish communities.
The original motivation for developing the multiplexing method, which was published online in Genome Research last month, was to be able to cost-effectively sequence genome-wide collections of short hairpin RNAs contained in bacterial clone libraries, and to link each sequence back to its clone, according to Greg Hannon, a professor at Cold Spring Harbor Lab and the senior author of the paper.
Until recently, he and his team sequenced each clone by capillary sequencing technology, since second-generation sequencing platforms did not allow them to link sequence reads back to specific clones. That, however, was an expensive approach.
"We have spent literally many millions over the years, certainly more than $10 million, sequence-verifying clones by conventional sequencing," Hannon said.
Over the last few years, researchers as well as vendors of second-generation sequencing platforms have come up with barcoding strategies in which each sample is tagged with a unique oligonucleotide prior to sequencing. However, generally these approaches only allow multiplexing dozens to hundreds of samples.
In order to be able to sequence tens of thousand of samples in parallel, the Cold Spring Harbor researchers decided to mix them in certain patterns to create pools, where each pool — but not each individual sample within it — is tagged with an oligo barcode. Since it is known which pools contain which samples, individual samples can be assigned to a sequence with high confidence based on the sequence patterns in the pools.
The strategy the researchers are using to pool the samples is based on the Chinese remainder theorem, which has been known for almost 2,000 years, according to Yaniv Erlich, a graduate student in Hannon's lab and the first author of the paper. Another article that focuses on the mathematical aspects of the method is in preparation, he said.
His aim was to "minimize the amount of robotics we are using, the amount of sequencing, and the number of pools," he said. This sets the new method apart from other pooling strategies, he added, for example those used in BAC pooling, which often try to minimize the number of pools, thus creating large pools and requiring a lot of robotics time.
In their paper, the researchers employed the Illumina Genome Analyzer to test their method, which uses 384 barcodes, by sequencing two libraries, each consisting of about 40,000 bacterial clones and comprising approximately 20,000 different microRNAs. They achieved greater than 97 percent accuracy.
At present, they are analyzing libraries with more than 60,000 clones, according to Erlich, and in theory, it is possible to analyze more than 100,000 samples.
The method, dubbed "DNA Sudoku," is currently best suited to analyze sequences, or genotypes, that are rare — for example, rare alleles in a population, or shRNAs in a clone library. "If we have two alleles with the same frequency, we cannot use this method to distinguish between these," Erlich said. In addition, sufficient sequencing depth is necessary to assign sequences with high confidence.
[ pagebreak ]
Sequencing technologies with longer reads than the existing ones — such as the technology developed by Pacific Biosciences — could eventually enable researchers to analyze more common genotypes because the long reads "pick up natural variation among individuals" that can be used to distinguish between samples, according to Ehrlich.
With the new method, which the researchers have patented, it costs between five and 10 times less to analyze a clone library than by Sanger sequencing technology, according to Hannon. He said it now costs between $50,000 and $80,000 to analyze the same number of clones "that would have constituted a fairly substantially complete library in the past."
The Cold Spring Harbor scientists are not the only ones to explore pooling strategies for multiplexed sequencing. Researchers at Columbia University, for example, have developed a related approach, which also appeared in Genome Research last month.
As part of that paper, the researchers devised a simulation, using short-read data from one of the pilot projects of the 1,000 Genomes project, to test how their approach to extract rare variations.
According to Itsik Pe'er, a professor in the department of computer science at Columbia and one of the authors, the original aim was to develop a method for resequencing candidate genomic intervals across hundreds or thousands of cases.
"I believe it is even more exciting for many experiments where related sequences are to be obtained from many sources in parallel," he told In Sequence by e-mail.
Since he conducts his research in a computational lab, Pe'er and his colleagues have not yet used their method in a sequencing project, but have received interest from others in the approach, he said.
Multiplexed Carrier Testing
Apart from sequencing clone libraries, the Cold Spring Harbor researchers are also about to test their new method in a project that involves genotyping large numbers of human samples.
In collaboration with Dor Yeshorim, a New York-based organization that aims to prevent genetic diseases in participating orthodox Jewish communities, the researchers plan to analyze several thousand previously characterized human samples in order to determine their carrier state for certain genetic diseases
According to Erlich, Dor Yeshorim represents one of the largest genetic centers in North America, processing more than 20,000 samples per year. Ashkenazi and Sephardic Jews have an increased risk for being carriers of a number of recessive genetic disorders, such as Tay-Sachs disease or cystic fibrosis, and Dor Yeshorim offers members of participating orthodox Jewish communities with a large percentage of such carriers to genotype them as young adults.
The organization does not report back the results, but instead provides participants with a number that encodes the carrier state. Only if two participants want to get married do they submit their numbers to the organization to find out whether or not their children are likely to develop a recessive genetic disease, or whether they are "compatible or incompatible for the marriage," according to Erlich. Since the program was started in the 1980s, it has helped to nearly eliminate Tay-Sachs disease in participating communities, he said.
Under their collaboration, Cold Spring Harbor will analyze several thousand previously characterized samples provided by Dor Yeshorim and assess whether or not they are carriers for certain genetic diseases. "The vision is to take 10 loci, 8,000 specimens, [and] sequence them in one Illumina run," Erlich said. The panel of genes to be tested could be increased in the future, he added.
One part of the project is to validate the new method, and to compare the sequencing-based results to those derived from standard genotyping. Another part will be to identify new causative mutations in cases where a disease allele is known to exist but the precise mutation is unknown, according to Hannon.
Another possible application of the multiplexed sequencing method is in HLA testing, according to the researchers, although this will be more difficult because the state of both alleles in the genome needs to be inferred, and because complex haplotypes need to be reconstructed from short reads. "Theoretically, it should be feasible," Erlich said.