Researchers in Sweden have developed a new resequencing technology that combines a simple chemistry using low-cost reagents with relatively long reads.
The technology, called shotgun SBH, is based on sequencing by hybridization, requires a reference sequence, and is currently being developed at Genizon BioSciences. Combined with a genome region-enrichment method, it could find applications in resequencing sections of the human genome, according to its developers, who published a description of an early version of the technology online this week in Nature Biotechnology.
“I think that’s the perfect fit for our method because you can do that very inexpensively,” said Sten Linnarsson, an assistant professor in the department of medical biochemistry and biophysics at the Karolinska Institute, who helped develop the methodology.
Originally, Linnarsson and his colleagues developed their approach, which immobilizes short DNA fragments and sequentially hybridizes sets of short detection probes to them, at Global Genomics, a Karolinska Institute spin-out he co-founded. Genizon Genomics bought most of the company’s assets three years ago and made it its Swedish subsidiary, Genizon Svenska.
The method starts by fragmenting DNA and converting it into single-stranded circular molecules with 200 base-pair inserts. Millions of these are poured onto a microscope slide that is covered with capture oligos and amplified by rolling circle amplification.
Next, the scientists probe the DNA with a panel of approximately 600 labeled hybridization probe sets, which are added and imaged one by one. Each probe set consists of a mixture of 16 heptamer oligonucleotides, which contain a unique pentamer sequence and two degenerate nucleotides. The probes also contain locked DNA, which hybridizes more strongly than natural DNA and raises the hybridization temperature.
Each probe set generates a characteristic hybridization spectrum, indicating which fragments the oligos have bound to. Using a customized algorithm, the researchers can align these hybridization spectra to a reference genome and reconstruct the sequence of each 200-base DNA fragment from the spectra.
The main difference between this approach and other incarnations of SBH, a concept that is about 20 years old, is that it uses a reference genome, enabling the scientists to obtain millions of long reads — in their case, 200 bases long — and to resequence several megabases of DNA at a time, according to Linnarsson.
The reason his team chose a read length, or fragment length, of 200 bases is that this represented a good compromise. “We could make it 400 bases, but you would have to pay in terms of accuracy,” he said. “We could make it shorter, but we would risk not being able to uniquely place the reads.”
As a proof of concept, they sequenced both the 48-kilobase bacteriophage lamba genome and the 4.6-megabase E. coli genome. At 27-fold coverage, they were able to obtain sequence information for 80 percent of the E. coli genome, with an overall accuracy of 99.94 percent.
They were also able to call SNPs with a false-negative rate of 3.5 percent and a false-positive rate of 0.06 percent. At higher coverage, they achieved similar results. In addition, they detected copy number differences that derive from ongoing replication of the E. coli genome by studying differences in the depth of coverage across the genome.
“It’s not an ultra-sophisticated tool; it’s a workhorse that you need when you want lots of information fairly quickly from something you do frequently.”
Since their study that appeared in Nature Biotechnology, which is based on data more than a year old, the scientists have improved the accuracy for the E. coli genome to 99.99 percent by optimizing probes and reaction conditions. They have also increased the fraction of the genome they sequenced to 97 percent by reducing the number of PCR cycles during a sample preparation step, Linnarsson said.
Currently, he said, the system is most suitable for SNP calling, but changes in the software could enable it to call deletions and short insertions, and maybe also inversions, which the researchers have not investigated yet. Calling longer insertions, though, would “require a more extensive modification of the method” that relies on de novo assembly, he said.
Repeat sequences, which make up about half the human genome, will be a challenge, he acknowledged. “If you do shotgun sequencing, no matter what method you use, [repeat sequences] will be very difficult to resolve.” However, since a “significant fraction” of repeats in the human genome are not identical, “some might be resolvable by our method,” he added.
In their study, the scientists typically generated approximately 1.6 gigabases of raw sequence data in a five-day run, but they mentioned in their article that the fluidics cycle time could “probably be decreased substantially.”
The cost of sequencing, the researchers estimated, was $0.32 per megabase in crude reagent costs, and $0.5 per megabase when amortized equipment cost was included. This would put the cost of sequencing a human genome at 30-fold coverage, excluding labor, at $45,000. Meantime, sequencing an E. coli genome at the same coverage would cost $69.
Moving to Market
The simplicity of the sample prep and sequencing chemistry, which does not require any enzymes, and the low cost could make the technology attractive for resequencing applications, according to John Hooper, president and CEO of Genizon BioSciences, which is commercializing the technology, internally called “Cantaloupe.”
“It’s not a de novo sequencer, it’s resequencing,” he said. “It’s not an ultra-sophisticated tool; it’s a workhorse that you need when you want lots of information fairly quickly from something you do frequently.”
Genizon also has a companion method to enrich fractions of the genome 10,000-fold, he said, though the company has “not investigated it very far” yet because it wants to focus on the resequencing technology first.
But the resquencing method might miss some potentially important variations. “If there are significant differences between the reference and the genome being sequenced, the fragments will not get placed correctly or at all,” commented John Oliver, vice president of research at NABsys, a Providence, RI-based startup company that is working on a sequencing technology that combines SBH and nanopores (see In Sequence 1/9/2007).
“In general, it has the same limitations as any resequencing method has in its inability to catch all of the variations that it now appears are present between human genomes,” he said.
Oliver pointed out that the method appears to be less expensive than other second-generation sequencing methods, which are also used for resequencing applications. “It needs to be sped up significantly to be competitive, and some issues need to be resolved in order to get better coverage, but I think it shows great promise,” he said.
Hooper’s company, which changed its name to Genizon BioSciences from Galileo Genomics in late 2004, searches for genes linked to diseases by performing genome-wide-association studies in members of the Quebec Founder population. Originally, it acquired the Cantaloupe technology for in-house use to discover disease-related sequence variations in candidate genomic regions identified in its studies.
“There was no fast and inexpensive way of defining what sequence variations were involved,” Hooper said. “We captured this technology as a way of doing that internally. We want to do it cost-effectively, and it’s not cost-effective to send 50 candidate gene regions to 454.”
But the company started to believe that the technology also has commercial value. Within a year, Hooper estimates, it could be turned into a marketable product. However, “we can’t market that product, that’s not our business,” he said. “Our business is to find pathways that cause disease.”
Because the sequencing technology is not its focus, Genizon plans to either outlicense it or spin it out as a separate company. “We are trying to do that in the near future because I think it needs a big push with substantial dollars to make it work,” Hooper said.
In order to commercialize the system, several aspects of it need to be improved, he said, some of which Genizon has already tackled.
For example, replacing rolling circle amplification with a bead-based amplification technology, such as emulsion PCR, would help to increase the density of the DNA fragments on the slide, and as a result, the throughput. Bead-based amplification would also increase the signal-to-noise ratio, enabling better accuracy and longer reads.
Initial experiments by Linnarsson have already shown a 30-fold increase in the signal-to-noise ratio and a 10- to 100-fold increase in density, Hooper said.
According to Linnarsson, the read length could possibly be extended to 400 bases this way. “If you go much further, then the fundamental problems of SBH start to kick in, then you would need to do more drastic changes,” he said.
But there are trade-offs that come with bead-based amplification. Emulsion PCR, for example, “is not trivial,” especially if large numbers of samples need to be processed, Linnarsson pointed out. Also, “it’s quite a lot more expensive,” possibly turning sample preparation into the most expensive part of the process. “On the other hand, you might get 10 times more sequence” from a run, he said.
“The benefits from switching from a slide to a bead are so enormous that we would be willing to accept the price of increased complexity,” Hooper said. He stressed that the simple sequencing chemistry would still be the same.
The accuracy of the system could further be improved by optimizing the probe sets through changes in their GC content, he said, adding that this will be achieved by Genizon within the next few months.
The technology also needs a commercial instrument. At present, it runs on several commercially available equipment parts, including a Nikon inverted microscope and a Tecan autosampler, and uses a custom flow cell.
Genizon hopes it can use an already existing platform. “There are instruments now on the market which, with minor modifications, would work immediately for this product,” Hooper said. “It doesn’t make sense for us to go out and develop an instrument when there is something out there.”
One possibility would be the Polonator, the open-source instrument developed by George Church’s group in collaboration with Danaher Motion, he said (see In Sequence 2/5/2008).
According to Linnarsson, the instrument would have to be able to generate a 70-degree temperature range, which is required for the hybridizations, and to inject about 600 reagents using an autosampler.