For next-generation sequencing, paired reads — two stretches of sequenced DNA with an unsequenced insert of known size in between them — seem to be indispensable.
While paired reads have been crucial in Sanger sequencing for applications like de novo assemblies, “for short reads they are even more important as they add significant specificity,” said Chad Nusbaum, co-director of the genome sequencing and analysis program at the Broad Institute, in an e-mail message to In Sequence.
For example, he explained, only about 80 percent of the human genome can be uniquely mapped with 25-mers, whereas 95 percent of it can be mapped with accurately paired 25-mers. “For very short reads, one simply cannot assemble without pairs,” he added.
Besides helping with de novo assemblies, paired reads — also called paired-end reads, read pairs, or mate-pairs — can help to detect structural variations in the genome like insertions or deletions, copy number variations, and genome rearrangements, as well as to characterize the ends of cDNAs or diTAGs.
454 Life Sciences, Illumina, and Applied Biosystems are all working on different strategies for making paired-end libraries on their next-generation instruments. In Sequence caught up with them last week to find out their plans.
454 and Roche
Roche launched its first kit for paired-end reads for the GS 20 last September. Researchers fragment DNA to 2 to 2.5-kilobase pieces, add biotinylated adapters to the ends, and circularize the DNA. They then cut the circle with the Mme I restriction enzyme, which cuts out a small segment containing the 44-mer adapter, flanked by 20-mers of DNA on either side that originated from the ends of the fragment. After amplifying these short snippets of DNA, they can be sequenced on the Genome Sequencer.
The company is now working on a new kit that will enable researchers to sequence longer reads — 100 base pairs on average — that are separated by a 2.5-kilobase fragment.
454 presented that method at last month’s Advances in Genome Biology & Technology conference in Marco Island, Fla. “The main difference in this protocol reported at AGBT and current kits is the use of nebulization rather than Mme I digest to create fragments to be sequenced,” Mary Schramke, 454’s vice president of marketing, told In Sequence last week. That process creates breakpoints at random.
Besides uses for “traditional” de novo sequencing, she said, sequencing these longer fragments will have applications in high-resolution analysis of structural genome variation, “which we believe will have tremendous value in deeper understanding of human genetics and human disease,” Schramke said.
The protocol is already in the hands of some users: Jan Korbel, a researcher in Mike Snyder’s lab at Yale University, presented data at the AGBT conference showing how he used the new approach to study deletions, insertions, and inversions in the human genome using the GS FLX.
Schramke said Roche expects to launch the new kit broadly “sometime this year.”
Illumina has been developing two methods for sequencing paired ends on its Genetic Analyzer, which it also presented at last month’s AGBT meeting. In the first method, scientists fragment DNA and separate it by size on a gel. Fragments shorter than 600 bases are grown directly into clusters, requiring less than a microgram of DNA, David Bentley of Illumina, formerly Solexa’s chief scientist, told In Sequence by e-mail.
For longer fragments, the researchers need to attach the ends to an adapter, remove most of the intervening DNA, and grow the resulting fragments into clusters “using the standard protocols,” he said.
In both cases, after growing the clusters, the scientists sequence each end of the fragment sequentially.
The second approach involves sequencing from one end and from the middle of the same template. It “can be used with a range of specialized DNA libraries, such as jumping libraries, with long-range insert information,” according to Bentley.
Illumina also uses this method to barcode individual samples, so they can be pooled and sequenced in a single run (see In Sequence 2/20/2007). To make these libraries, researchers attach a unique tag to each sample during sample preparation and grow clusters. They then sequence the clusters using a first primer, followed by a second primer that reads the tag sequence.
“Since our methods do not require enzymatic cleavage to create the paired-read fragments, we can generate reads of 40 bases (or more) from each end of the paired read,” Bentley wrote. “Longer paired reads significantly increase the power of the method for locating reads more accurately within a genome.”
“For short reads they are even more important as they add significant specificity.”
Illumina plans to release its paired-end sequencing kits in the near future. “We expect to ship the first of these paired read kits to a limited number of users for testing within the next two weeks,” Bentley wrote. “We would expect to make the kits broadly available shortly thereafter.”
Even though ABI’s SOLiD sequencer is not out yet, the company is also already working on protocols for paired-end libraries. ABI researchers fragment DNA, select pieces of uniform size — ranging from 1 kilobase pieces to 8 kilobases pieces — and circularize the DNA onto a 20-base-pair linker.
They then use the restriction enzyme EcoPI51 to cut out a segment that includes the linker and 26 bases of DNA on either side. The scientists then amplify these segments and sequence them.
“The size of the fragments is up to the users,” Kevin McKernan, ABI’s senior director for scientific operations for high throughput discovery, told In Sequence. He also said he believes 25-mer reads on either end are sufficient in length. “Paired 25-mers in the human genome are unique enough to get 93 to 94 percent of the genome. We are going to get the lion’s share of what we want.”
“The really important feature of paired end reads, at least if you want to consider de novo assembly, is to have a very tight insert size distribution,” he said. His company is aiming for less than 10 percent variation in insert size, and has found protocols that involve HPLC separation that can bring the variation down to 5 percent.
ABI plans to make the paired-end library protocol available to users as soon as its instrument launches. “We are already training people on them now,” McKernan said.
Will the different vendors’ protocols meet the need of users? Maybe not. “The existing systems in use by those three companies all have drawbacks,” the Broad’s Nusbaum commented. “More work is required, especially to figure out how to make the very large ([40,000 bases] or Fosmid sized) links that will be required to assemble a human genome.”