The National Heart, Lung, and Blood Institute and the National Human Genome Research Institute have set aside $12 million in grants to help reduce at least tenfold the cost of whole-genome exon sequencing methods and enable future genome-wide exon resequencing studies that will focus on specific diseases.
The new technology grant program, which plans to make up to four awards this fall totaling $12 million over two years, as well as disease studies NHLBI has planned for the future, complement the 1,000 Genomes Project that the NHGRI, the Wellcome Trust Sanger Institute, and the Beijing Genomics Institute announced last week.
Unlike that project, which aims to catalog genetic variations in human populations unbiased for disease, the resequencing technology program is geared towards studies that will correlate sequence variations with disease phenotypes.
“We built a consensus, through informal and some formal meetings we convened, that the time was right to take the next-generation technologies that are being developed and implemented and build the kind of pipeline that would be required for large-scale production sequencing” of thousands of samples from disease-specific collections maintained by NHLBI, Alan Michelson, associate director for basic research at NHLBI, told In Sequence earlier this month.
The goal is to develop inexpensive methods for capturing and sequencing all exons, which make up between 1 percent and 2 percent of the human genome. Other functional regions, such as microRNAs and regulatory elements, might also be included, according to the grant announcement.
These methods would fill a technical void for studies that neither seek to resequence entire genomes nor to resequence a single candidate gene, according to Weiniu Gan, a program director in the genetics, genomics, and advanced technologies program of the division of lung diseases at NHLBI.
“We think this is a gap [and] we want to see if we can fill that,” he said.
Exon Marks the Spot
At the moment, sequencing an individual’s exome, or about 60 megabases of DNA, at high quality — including all costs such as personnel, reagents, instrument amortization, and facilities — costs “multiple tens of thousands of dollars,” according to Adam Felsenfeld, program director of the large-scale sequencing program at NHGRI. “Right now, that’s too expensive to do routinely,” he said. The aim of the program is to reduce this cost to about $1,000 per sample.
To reach this goal, researchers awarded the two-year grants will combine new technologies and methods for sample preparation, target capture, sequencing, and data management and analysis into an “integrated re-sequencing pipeline,” according to the grant solicitation.
According to Felsenfeld, developing new capture methods and integrating them with different sequencing platforms will be among the main technical challenges researchers need to overcome.
“Although there are some very encouraging capture methods, I think the field is not settled yet,” he said. “There are some early successes, but I think there is a long way to go and a lot of opportunity out there.”
Last year, several research groups published studies that coupled new DNA capture methods with new sequencing technologies, such as NimbleGen microarrays with 454’s sequencer, NimbleGen arrays with Illumina’s Genome Analyzer, and an Agilent oligo-based PCR-like approach with Illumina’s sequencer (see In Sequence 11/6/2007).
But these will likely not remain the only methods for exon sequencing. “Things won’t settle for some time because new and better techniques are going to constantly be dropped into the mix for some time, and people are going to look at those in combination with their other technologies and see what’s best,” Felsenfeld said. He added that several successful combinations of capture and sequencing might emerge in the end.
According to the funding announcement, the research will probably take place in two phases. During an initial “optimization phase,” scientists will develop and fine-tune their methods and put them into a production pipeline using DNA samples that have already been characterized in genome-wide genotyping or sequencing studies, such as the HapMap samples, which are unbiased for disease. Fewer than 20 DNA samples per investigator “may be sufficient” for this phase of the project.
In the second, scale-up phase, scientists will run hundreds of DNA samples to see if their re-sequencing pipeline is suitable for sequencing all exons in thousands of DNA samples in later studies.
Technologies developed under the NHLBI/NHGRI’s program might also benefit the 1,000 Genomes Project. Under one of its pilot projects, participating sequencing centers will sequence exons of about 1,000 genes in about 1,000 individuals.
Last week, Elaine Mardis, co-director of the Genome Sequencing Center at Washington University, told In Sequence that her center has not yet decided which capture method to use in that pilot study, but it will consider both an in-house method and approaches based on Agilent and NimbleGen technology (In Sequence 1/22/2008).
“Although there are some very encouraging capture methods, I think the field is not settled yet. There are some early successes, but I think there is a long way to go and a lot of opportunity out there.”
In addition, companies are working on methods for targeted exon resequencing as well. NimbleGen, for example, is optimizing its array-based capture method for 454’s platform and plans to launch products and services early this year (see In Sequence 10/16/2007). And Raindance Technologies and Febit Biotech are working on methods for selective sequencing (see In Sequence 7/17/2007 and In Sequence 7/24/2007).
Further, last summer, researchers from Stanford University won a grant under the NIH’s Cancer Genome Atlas technology-development pilot project to develop methods for the high-throughput isolation of genomic regions for DNA sequence analysis (see In Sequence 7/10/2007).
Regardless of whether the most cost-effective exon re-sequencing pipeline will emerge from its own program or from others, NHLBI plans to apply the technology in future disease studies that will involve sequencing all exons in “thousands of selected individuals” from its “well-phenotyped populations” that have “adequate consents” for data sharing, according to the funding announcement. These studies differ from the 1,000 Genomes Project, which focuses on populations that are unbiased for disease and does not include phenotypic information about the subjects.
According to Michelson, NHLBI would eventually like to perform resequencing studies of common complex diseases such as cardiovascular and lung diseases, but has not made any commitments yet. At the moment, the focus is still on technology development, and “once that is in place, it’s really just a question of making the appropriate funds available for all the various competing initiatives to deploy this on a large scale to a whole range of diseases,” Michelson said.
Such disease-centered re-sequencing studies could help researchers better understand disease mechanisms and complement results from genome-wide association studies.
While large-scale genotyping studies can only detect relatively common genetic variations that occur in at least 5 percent of a population and usually have small effects on disease, sequencing can also detect rare variants that have large effects, according to Michelson.
“We are hoping — and this is unproven — that the larger-effect alleles that should be detected by direct sequencing will provide more rapid entry points into actual mechanistic understanding of disease,” he said.
Sequencing results could also help researchers interpret the results of genome-wide association studies by providing a catalog of genetic variants in genome regions that these studies have flagged as important.
“Whereas genome-wide association studies quickly get you … to a region, and possibly a bunch of candidate genes, [sequencing] quickly gets you to the next step of being able to test hypotheses about individual variants in larger study samples,” Felsenfeld said.
What limits NHLBI’s approach is its focus on exons, which only make up between 1 and 2 percent of the human genome, but that is still the “most interpretable part of the genome” whose functions are known, according to Michelson. “By direct sequencing of coding regions you may be closer to causal variants, or at least generating good hypotheses about causal variants that then can be tested in some specific biological context,” he said.
Data from disease resequencing studies will be accessible through the database of Genotype and Phenotype, which is hosted by the National Center for Biotechnology Information. dbGaP already contains genotyping and phenotyping data from approximately 30 studies and offers restricted access to certain data to authorized users.
But the NHLBI’s planned studies are not the only resequencing projects that seek to correlate genotypes and disease or other phenotypes. Under the NHGRI’s and the National Cancer Institute’s Cancer Genome Atlas pilot project, for instance, researchers will sequence DNA from lung, brain, and ovarian cancer samples. In addition, the NHGRI, under its large-scale sequencing program, maintains a number of medical sequencing projects. The institute also launched an intramural program called ClinSeq last year that focuses on cardiovascular disease and initially aims to sequence 200 to 400 genes in 1,000 people (see In Sequence 6/5/2007).
In addition, the Personal Genome Project led by George Church at Harvard Medical School seeks to enroll 100,000 participants from the general population over time. It wants to sequence at least their exons and relate that sequence information to medical and other phenotypic data. This project will not focus on specific diseases, but may rather “make possible preliminary screening of proposed associations prior to more rigorous or focused data collection,” according to the PGP website.