All next-generation sequencing technologies on or close to market fragment DNA and sequence it at random.
But creating some order among the DNA pieces would be an advantage for sequencing known genomes, according to Kalim Mir, a group leader at the Wellcome Trust Center for Human Genetics in Oxford.
His group has been developing ordered microarrays for single-molecule sequencing, along with a new sequencing biochemistry, and is working on another array-based method that will couple sequencing with long-range genome information.
Mir presented proof-of-principle results for his technologies at last week’s Cambridge Healthtech Institute Next Generation Sequencing Technologies meeting in San Diego.
The main advantage of ordered arrays, in which DNA fragments are captured on a microarray by probes to specific genomic regions prior to sequencing, is that “it allows us to make the analysis systematic,” Mir told In Sequence during the meeting. “It also enables us to select particular regions of the genome,” he said. “So if you wanted to select exons, or control regions, or if you wanted to select particular candidate genes, you can use your array to do that.”
Also, because many copies of a target sequence are present within each spot, enriching that part of the genome selectively, “you get a large degree of oversampling, which allows confidence in a base call to be increased greatly,” Mir said. In addition, heterozygosity and haplotypes can be readily resolved that way, he added.
While the concept of ordered arrays for sequencing is not new — for example, Affymetrix uses its GeneChips for resequencing applications — Mir’s group has adapted them for single-molecule sequencing. The researchers spread the capture probes far enough apart so they can distinguish single molecules in each spot, for example, by diluting them prior to spotting them on the array.
Whereas in other next-gen technologies, the same DNA fragment may be sequenced at several different places spread across the slide or plate, “we are capturing many copies of the same fragment within a spot” and do not have to look for them all across the slide, Mir said.
Of course, ordering genomic fragments on the array means that the researchers will need to have prior knowledge of the genome they want to sequence. For unknown genomes that are not closely related to a known one “there is probably not much more of an advantage to this approach relative to the random approaches,” Mir said, “but that’s not our interest.”
Since the technology performs single-molecule sequencing reactions — a concept pursued by others, such as Helicos BioSciences, on a random array —dephasing is not a problem.
“It does not matter if [the reactions] are all going at a different pace; you can just follow them,” Mir said.
For those reactions, his group has developed a new sequencing-by-synthesis biochemistry that employs oligonucleotide ligation. Because the label is far away from the ligation site, researchers can use bulky labels with multiple dyes, such as quantum dots or phycoerythrin. That prevents problems arising from single-dye labels, Mir said.
Other single-molecule approaches, he said, work with polymerase and labeled nucleotides and can only use single dye-labels. These can enter so-called “dark states,” which create gaps in the sequence. “Even when extension has occurred and [the DNA is] labeled, the label is not fluorescing, so you get these gaps in your sequence reads,” Mir said. “Our approach overcomes that.”
Mir’s sequencing biochemistry somewhat resembles the chemistry used by Applied Biosystems’ SOLiD platform, which is nearing beta testing, in that they both rely on sequencing-by-synthesis and use ligation of oligonucleotides.
His group and Agencourt Personal Genomics, which ABI acquired last year, developed their respective technologies at around the same time. “Ours is a more literal read of the sequence, one by one,” Mir said. It addresses the next base in the sequence at the site of the ligation, whereas ABI’s method interrogates two bases at a time that are a few bases removed from the ligation site, but has to add information from several sequencing cycles to decode the bases.
So far, Mir’s proof-of-principle study has shown that the ligation chemistry works and can generate two to three bases of sequence information using four different labels. “We are now integrating it with an automated system,” he said, allowing the scientists to optimize it further and increase the read length.
Mir, who did his PhD project in Ed Southern’s lab at Oxford, and whose research has been supported by the Wellcome Trust, said he plans to publish results from the study soon.
The main advantage of ordered arrays is that “it allows us to make the analysis systematic [and] enables us to select particular regions of the genome.”
Finally, Mir’s lab has been working on an array-based sequencing method that will deliver not only sequence information but also place sequence reads in the context of the genome by using long, stretched-out genomic fragments.
The approach, he said, shares some features with the optical mapping and optical sequencing methods developed by David Schwartz at the University Wisconsin-Madison and Bud Mishra at New York University (see In Sequence 3/13/2007), but “we have developed an entirely different platform for doing that.”
So far, “we have shown that we can use a microarray to capture long fragments of genomic DNA and stretch them out and be able to see each molecule individually,” he said.
To sequence a human genome, the researchers would fragment the genomic DNA and reduce its complexity before capturing the fragments on the single-molecule microarray that Mir’s group has already developed. Mir did not elaborate on how they would make the genome less complex, but said that “all the genome would be there, but the complexity issue would not be a problem.”
After stretching the fragments out by fluidics, they could either sequence them with random primers after making the fragments single-stranded, or by introducing nicks in the double-stranded DNA and initiating sequencing reactions at the nick sites.
“The sequence information that you obtain will be in the long-range context,” Mir said. “Initially, it will be a way of fingerprinting the genome to look for copy-number variation or any structural rearrangements,” he said. “Ultimately, we are developing it into a method that could sequence the whole human genome.”
Mir said he has shown proof-of-principle for the biochemistry and the arrays, and has filed patents on different aspects of the technologies, but “what we need is more development.” He has not yet decided how to proceed but said the methods could be developed by a large-scale sequencing facility, a start-up company, or an existing company that licenses them.
“I think the stretched-out molecules…would require three to five years before [the technology] comes on the market,” he said. “But the nearer-term opportunity is in sequencing selected regions of the genome without having to do PCR up front.”