The National Heart, Lung, and Blood Institute and the National Human Genome Research Institute recently awarded three teams of researchers a total of $12 million over two years to improve DNA capture methods and to integrate them with second-generation sequencing platforms and analysis tools into pipelines to perform resequencing studies of phenotyped clinical samples in the future.
The three awards, funding for which started at the end of September, went to a group led by George Church at Harvard University Medical School; a team led by Stacey Gabriel at the Broad Institute; and a group at the University of Washington in Seattle headed by Deborah Nickerson.
NIH announced the program earlier this year to spur the development of inexpensive methods for capturing and sequencing all exons of the human genome, which make up between 1 and 2 percent of the total DNA (see In Sequence 1/29/2008).
All three teams are working on making existing methods for multiplexed targeted amplification of exons and other regions of the genome robust, and will test them for routine use in large-scale resequencing studies. Their results “will have far-reaching impact not only on diseases of interest to NHLBI but also on the entirety of human genetics as discovery of disease-causing genetic variants become increasingly dependent on large-scale DNA sequencing,” an NHLBI official told In Sequence by e-mail.
“We are [all] pursuing variations on the theme,” Church told In Sequence last week. “The goal is not to develop something radically new at this point; it is to get it into production.”
His team won $2 million in fiscal year 2008, according to an NIH database. The researchers, which besides Chuch’s own group include Kun Zhang at the University of California, San Diego, and Jon Seidman at Harvard, plan to further develop two types of DNA-capture methods for use with sequencing on the Illumina Genome Analyzer and the Polonator: padlock-capture and hybridization-capture.
Last fall, Church’s team published in Nature Methods a description of the padlock probe method, also known as circular capture or the molecular inversion probe method (see In Sequence 10/16/2007). In that paper, they showed that they were able to amplify approximately 10,000 exons in a single reaction and sequence them on the Illumina Genome Analyzer. Since then, they have increased the size of the gap targeted by the probes and the number of exons analyzed, Church said.
According to their grant abstract, the researchers will develop a padlock probe library for exon capture; optimize protocols to reduce cost and amplification bias and to increase coverage; and scale the method up to the entire RefSeq exome.
Though it is difficult to measure precisely, Church said, the cost for sequencing about 20 percent of exons is currently approximately $800. The goal is to reduce this to $100 for all protein-coding regions and a few highly conserved regulatory regions.
Compared to hybridization capture, the padlock probes require about 20-fold less genomic DNA and are 99-percent on target, according to Church.
The solution-based hybridization capture method that his group is also working on — although it is currently only 30 percent on target — “is easier [for performing] rapid prototyping,” he said. Though it is slightly more expensive and has greater DNA input requirements, it is more flexible and is therefore more suitable for small candidate gene resequencing and other custom studies, he said.
According to the grant abstract, the researchers plan to capture sheared exon fragments on nitrocellulose filters; optimize protocols and reduce cost; develop barcoding methods for sequencing exomes of multiple subjects in parallel; and scale up to the RefSeq exome.
The approach, according to the abstract, “will provide opportunities to detect larger indels than padlock capture, but the padlock probes may be useful reagents for sheared exome capture.”
In addition to improving capture methods, the Harvard team plans to develop algorithms for SNP calling and indel detection. Another goal — although not stated in the abstract — is to integrate existing haplotyping methods with the exon capture, according to Church.
To assess their methods, the scientists plan to test them on the first 10 samples of Harvard’s Personal Genome Project. Once the pipeline is working, they intend to use it on up to 1,300 samples from the Framingham Heart Study, according to Church. “We feel that these are far preferable to DNA samples lacking extensive trait data,” he said.
The team from the University of Washington in Seattle won $1.9 million in fiscal year 2008 for its project, entitled “SeattleSeq.” Besides Nickerson, it includes Jay Shendure, Evan Eichler, and Phil Green, all professors in the department of genome sciences at UW.
The goal of SeattleSeq is to test several multiplex capture methods and integrate the best performing into a “scalable resequencing pipeline.”
The team plans to “try and stay as flexible as possible, at least early on,” according to Shendure. Two methods it plans to evaluate are gap-fill molecular inversion probes — which Shendure helped develop as a postdoc in Church’s lab — and array-based hybridization.
“The goal is not to develop something radically new at this point; it is to get it into production.”
The array method, he said, is “essentially analogous” to the method developed by researchers at Roche NimbleGen in collaboration with two academic groups last year (see In Sequence 10/16/2008 and 11/6/2008).
According to the grant abstract, the researchers also plan to develop computational tools for translating raw sequence data generated by new sequencing platforms into quality-tagged, consensus predictions of sequence variants.
Phil Green, who helped develop the Phred base calling program and Phrap sequence assembler that are used for Sanger sequence data, is “working furiously on next-gen applications, and has developed quite a few of those already,” according to Nickerson.
Tools to analyze copy number variations will also be included, according to Shendure.
In terms of cost, the aim is to reach “ideally, $1,000 per exome and lower if we can get it,” according to Nickerson.
She said that the goal for the first year is to test various exome-capture technologies and to set up the analysis tools. “The second year is to take the best approach that we have seen and apply that to hundreds of individuals,” phenotyped samples that will be chosen in consultation with the NIH.
The Seattle researchers will “primarily” use the Illumina Genome Analyzer for their pipeline, although “as people go out and see new technologies, the sequencing approach might change,” Nickerson said.
The third award went to the Broad Institute, which received $1.8 million in fiscal year 2008.
According to the grant abstract, the Broad researchers intend to optimize and validate a cost-effective approach for exome resequencing and implement the method at production scale.
The plan is to use a solution-based hybrid capture approach that the Broad Institute has been developing in collaboration with Agilent Technologies, Gabriel, co-director of the genome sequence and analysis program at the Broad, told In Sequence last week.
Carsten Russ, a research scientist at the Broad Institute, presented the method, called Hybrid Selection, at the Advances in Genome Biology and Technology meeting earlier this year (see In Sequence 4/8/2008).
The method used 170-mer biotinylated oligonucleotides that are transcribed into RNA baits to target exons in solution. After hybridization, the scientists capture the oligos on strepatividin-coated magnetic beads, PCR-amplify the target DNA, and sequence it.
Agilent said in May that it had licensed the method from the Broad Institute and plans to develop it into commercial kits (see In Sequence 5/13/2008).
Gabriel said she and her colleagues are working on the right conditions to be able to capture as much of the whole exome as possible. After that, they plan to test their method on “probably a dozen or so” HapMap samples. “Once we are comfortable with that, we will turn to the NIH and see what they want us to sequence,” she said.
The plan is to have a production pipeline ready by the second year and use it to resequence up to a thousand phenotyped samples, she said.
Like the other two centers, the Broad researchers will initially use Illumina’s Genome Analyzer as their sequencing platform, though Gabriel said other sequencing platforms could be used in principle.
One of the challenges will be to find the best computational tools for aligning the reads and calling SNPs with high sensitivity and specificity, she said. Broad researchers are both evaluating existing tools and developing their own methods for this purpose. “You have to have a really low false-positive rate or the false positives start to swamp your signal and you end up dong a lot of validation that you don’t want to do,” she said.