With about 3 billion bases, the human genome is a vast territory to comb through to look for a handful of genetic variants. Instead of looking at the whole sequence, many researchers are turning to targeted sequencing methods that allow them to only look at that portion of the genome they are interested in or to just study the exome. Importantly, this more selective approach saves money by cutting down on next-generation sequencing costs.
"I think it just makes for more efficient science, focusing your resources on some sequence where you have a higher a priori likelihood of finding something [that] allows you to extend those resources to a larger number of individuals, which, in the context of certain problems, can certainly make it more likely that you are going to end up with a meaningful and interpretable answer," says the University of Washington's Jay Shendure.
Cold Spring Harbor Laboratory's Emily Hodges adds: "The initial idea was for the obvious reason of wanting to only target certain regions of interest for specific projects that we were working on, and to be able to sift through the sequence data in a preemptive way, and be able to focus on regions that we were interested in looking at."
Going about that targeted approach, though, there isn't yet one breakaway, front-running sequence capture method. Research groups and companies are trying out various approaches, both array- and solution-based, as well as others. They are applying these methods to look at everything from Mendelian disorders to cancer to other complex diseases. Whatever the method and the project, a targeted approach gives researchers' data power by allowing them to look at a larger number of samples than they could otherwise study — all for the same price. But, as sequencing costs dwindle, whole genome sequencing is become more and more of a serious competitor for these capture methods.
The long and short of sequence capture is that probes target regions of interest in the sample genomes. Those captured regions are then sent on their way to be sequenced. There are, of course, various ways to go about this.
For array-based capture, the DNA is prepped like it is for sequencing — it's fragmented and adaptors are then ligated on the ends and maybe amplified a bit, says CSHL's Hodges. After the probes are fixed to the array, the DNA is hybridized to the probes. The kinetics of the hybridization reaction are controlled by the amount of DNA added to the array, she adds. After hybridization, the sample is eluted and possibly amplified again before being sent off for sequencing — Hodges uses the Illumina platform. She and her colleagues worked with Agilent to develop the SureSelect DNA Capture Arrays, based on the company's 244K arrays with 60-mer probes, which came out in July.
Roche NimbleGen's Sequence Capture arrays came out in 2008 and the company now offers two arrays: a 385K and a 2.1 million feature array, which can capture up to 5 megabases and 30 megabases, respectively. There's also a pre-designed whole exome product. Xinmin Zhang, the senior product manager for enrichment at NimbleGen, says the process has been optimized to feed into a 454 machine, but can also flow into other systems.
Febit's HybSelect sequence capture platform works similarly but is based on the Geniom microfluidic chip. Those chips can hold 120,000 oligonucleotide probes in their eight channels. Fragmented DNA then hybridizes to those probes. Currently, the chip can target up to 10 megabases, and Febit plans to double that amount by early this year, according to Peer Stähler, the vice president of marketing. He adds that the company has validated protocols for both the SOLiD and Illumina machines.
Arrays, though, are still relatively low-throughput. "For the most part, the microarray-based capture is going to be used when a researcher wants to run maybe a few samples, maybe it's three, five samples — under 10 — where they want to use a particular design to do a targeted enrichment," says Agilent's Fred Ernani, the Sure-Select marketing manager. "They don't necessarily need to run a lot of samples. They want to iterate their design."
"The scalability of arrays is difficult. It's feasible, I mean you can definitely do it if you throw enough money at it," adds Hanlee Ji at Stanford University. "But the technical advantages of scalability would be solution-based methods [and they] are likely to have a lot of advantages, particularly if you don't have large genome center resources backing you up."
Researchers who want to do targeted enrichment over more samples are likely better served with a solution-based approach. "Those researchers that want to do targeted enrichment over a larger number of samples — and it could be anywhere from five samples to several thousands of samples — it makes probably more sense to utilize the solution-based approach. It's much more easily scaled and it's faster," Ernani says.
Agilent's solution-based method was developed in collaboration with the Broad Institute's Andreas Gnirke. A few years ago, says Chad Nusbaum, who is also at the Broad, Gnirke was working on developing a capture method using genomic PCR products. "It turned out quickly that it wasn't working very well," Nusbaum recalls. Gnirke, though, had shown that 170-base-long probes worked best; at that time, Agilent could manufacture ones up to 200 bases long. So they teamed up.
Then, Nusbaum says, Gnirke had the further inspiration to use RNA as the capture agent. "It's also a single-stranded reagent, so no matter how much you pour into the reaction in terms of molar excess, it's never going to compete with itself. Everything that you pull down has to be the DNA on the other strand," Nusbaum says. "Once we were able to get tens of thousands of these oligos at a reasonable price, made in bulk on chips … we can make a bulk reagent where we can capture arbitrarily large numbers of targets."
Agilent then licensed that technology in 2008 and it has become the basis for its solution-phase SureSelect Target Enrichment System. NimbleGen also offers a solution-based approach, its SeqCap EZ Exome product that has 2.1 million DNA probes.
In addition, there are other methods. Stanford's Ji developed a targeted circularization approach that uses restriction enzymes to create genomic fragments and then uses oligonucleotide with flanking homology region, or what Ji calls targeting arms, to circularize the fragment. Then, the circularized piece can be amplified. Kelly Frazer at the University of California, San Diego, is also trying out a microdroplet PCR-based approach that she says has high specificity and allows for very deep sequencing of the targeted regions.
Perhaps the most famous application of targeted enrichment is Shendure's work with exome sequencing. He has shown that an array-based whole-exome capture method can be used to discover variants behind rare Mendelian disorders. Over the summer, Shendure and his colleagues, including Debbie Nickerson, published a proof-of-principle paper in Nature in which they reported sequencing the exomes of 12 people — eight individuals from the HapMap Project and four people with Freeman-- Sheldon syndrome — whose genetic cause was known. With these exomes in hand, Shendure and his colleagues filtered through the data to find the gene variant behind Freeman-Sheldon. Then the group tried a true unknown by studying Miller syndrome, as they reported in Nature Genetics. With the same approach, they found that mutations in the dihydroorotate oxidase gene were behind Miller syndrome. "There's lots of unsolved things that appear to be Mendelian diseases or are suspected of being monogenic, but haven't been addressable by conventional methods for a variety of reasons," Shendure says. "These have been sitting in people's fridges for a very long time and I think there's a lot of interest in bringing these approaches to bear."
One of those people is UCSD's Frazer. She is using Agilent's Sure-Select whole exome platform to study Mendelian and other single-gene disorders affecting children. "[There are] several success stories so far on individuals using that type of approach," she says, citing Shendure's work. "You can use half a dozen to a dozen individuals and identify the locus."
Cancer is another large area of interest for targeted sequencing studies. Frazer is using RainDance's microdroplet--based enrichment to deeply sequence cancer samples to search for mutations present in the tumor cells. And both Febit's Stähler and Agilent's Ernani say that the bulk of their customers are interested in oncology. Also popular, Stähler says, are neurodegenerative diseases. "Most users are using this technology for study of human disease," NimbleGen's Zhang adds.
Cold Spring Harbor's Hodges is taking a different tack: she is working on coupling array capture designs to bisulfite sequencing to profile genomic methylation. "It's even more difficult to do genome-wide bisulfite sequencing than it is to sequence norma,l unconverted genome because of the ambiguity in the sequence," she says. She and her colleagues are working on developing reference methylomes. Once those are made, they will use a targeted approach to go back and fill in the blanks.
The break point
Targeted sequencing came about as a way to use available funds efficiently while still get scientifically relevant sequencing results. As the cost of sequencing declines, it begs a few questions: Why bother going through the capture steps? Why not just sequence the whole genome? And indeed, whole genome sequencing to find disease variants has begun. Richard Gibbs' group sequenced an individual genome to 30-fold coverage to link variants in the SH3TC2 gene to Charcot-Marie-Tooth syndrome, for example.
But the problem still is cost. A $1,000 genome may be in the works, but it's still a few years away, researchers say. "Imagine you can do one genome for $1,000," says NimbleGen's Zhang. "If you can do targeted sequencing for the exome with only $100, that's still a benefit. With the same budget, now you can sequence 10 times more samples. In biological research, the more samples you have, the more power you have to identify the disease-causing variants."
The Broad's Nusbaum agrees that target sequencing will have some staying power. "No matter how cheap it is to sequence a genome, as long as the cost of your sequence capture methods is measurably less, then it is going to be cheaper to do targeting," he says. "I don't know that we'll always be doing targeting — certainly we may not always be doing it with this kind of avidity — but over the next few years, it is going to be important because of what is does to the statistics."
Of course, the efficiency of a capture approach — and the break point where it becomes a better move to switch over to whole genome sequencing — will depend on the individual study. Right now, UCSD's Frazer points out, Shendure needed 12 samples for his exome sequencing project and, in the future, it could be just as cost-effective to sequence 12 whole genomes. But, she adds, a different type of study, such as a study of a few hundred genes of interest in a population of heterogeneous cancer cells, will need a deeper look into the sequence, especially to find mutations that occur in 1 percent or fewer of the cells. "That may be further off in the future for phasing out targeting sequencing," Frazer says. "You're going to have to sequence it deep enough that you are looking at it as if it were 200 different genomes for one sample."
Furthermore, CSHL's Hodges points out that many labs don't have the capacity to sequence genomes to completion. "While sequencing whole genomes is becoming more widespread and the genome centers are able to do this, I think for normal labs that are interested in a particular disease phenotype, they know the region of interest that they want to sequence and they want to do this in many samples ... this will be very useful," she says.
Of course, the drawback over using capture methods is that you still are only looking at regions in the genome that you choose to look at. "We know you are giving something up," Nusbaum says. "Things we understand happen in open reading frames. We know what to do with the things that we find. ... Yes, we're looking under the light. We believe that most things are under the light, but we are also aware of what we're giving up."