The National Human Genome Research Institute should maintain its support for large-scale sequencing centers but should also provide opportunities for smaller groups to conduct sequencing projects that involve next-generation platforms, advised attendees of a recent workshop to help NHGRI define its goals for future sequencing programs.
Among other recommendations, workshop participants urged NHGRI to address the need for better computational resources to analyze and integrate sequence data, and said that the agency should help smaller sequencing projects benefit from the experience and tools available at the larger centers.
These suggestions arose from a workshop entitled "The Future of DNA Sequencing" that the National Institutes of Health's NHGRI held in late March as part of its ongoing two-year planning process to define its future scientific programs, according to a report posted on the institute's website last week.
The workshop, held March 23-24 in Bethesda, Md., brought together approximately 100 experts and stakeholders in high-throughput sequencing, among them directors of large-scale sequencing centers in the US, the UK, and Canada; investigators from a variety of universities; representatives of NHGRI, other NIH institutes, the National Science Foundation, and the US Departments of Energy and Agriculture; as well as company representatives from Helicos BioSciences, Prognosys Biosciences, Merck's Rosetta Inpharmatics, Beckman's Agencourt Bioscience, and Alloy Ventures.
According to the report, NHGRI has not yet made any program decisions on the basis of the workshop discussions, but plans to use them in conjunction with its other planning activities to make decisions about the future of the large-scale sequencing program, in the context of its extramural programs, and in consultation with the National Council on Human Genome Research.
The workshop resulted in a number of general recommendations, one of which states that NHGRI should support both large-scale and mid-scale sequencing activities. "The consensus for sequencing is that there are unbelievable opportunities at all scales, and that all scales should be encouraged," Adam Felsenfeld, program director of large-scale sequencing at NHGRI, told In Sequence last week.
According to the report, large-scale sequencing centers contribute "much more than just data," as researchers at those facilities learn how to design projects and solve biological questions by sequencing, implement new sequencing technologies fast, and set standards for sequencing and sequence data.
NHGRI needs to ensure, though, that "the tools and knowledge generated in the large sequencing centers be made robust and more readily usable by smaller research laboratories," the report states. For example, the large centers have developed many software tools for their internal use, "but there is a big difference between that and making them available — documented and robust — in a way that smaller groups can use," Felsenfeld said.
Furthermore, NHGRI should provide opportunities for "smaller, more specialized groups to engage in 'next-generation' sequencing projects that are not of an appropriate scale for the large centers," such as certain medical sequencing projects, according to the report.
What exactly constitutes mid-scale sequencing activities changes constantly, according to Felsenfeld, as the throughput of sequencing platforms increases. "Think of them in terms of a question that can be answered in a reasonably short time by a group that may not want to get into the business of sequencing forever," he said. Such projects could be, for example, follow-up studies for genome-wide association studies that involve sequencing of a few genome regions in multiple samples.
Felsenfeld pointed out that in response to the workshop, NHGRI recently provided a funding opportunity for mid-scale sequencing: As part of the American Recovery and Reinvestment Act's Research and Research Infrastructure "Grand Opportunities" grants program, the institute listed ARRA Medical Sequencing Discovery Projects as one priority area.
[ pagebreak ]
Under this program, the institute sought applications for moderate-scale — on the order of $1 million to $2 million per year — research projects "that will bring next-generation sequencing technology to bear on high-impact human genetic disease research." Applications for this initiative were due last week.
Another recommendation workshop participants made was for NHGRI to increase its funding for computational infrastructure, tools, and expertise.
"Computational biology methods, resources, and infrastructure have not kept pace with the increased rate of sequence output by the entire community," the report states, and "NHGRI will need to play a role in filling this need, along with other funding agencies."
According to Felsenfeld, this recommendation was not unexpected since this problem has been well known for a while. But meeting computational needs will be expensive and go beyond just sequence data. "I think that probably, we will have to pick some parts of its where we think we can really make a difference," he said.
It also became clear during the workshop that because NHGRI is not a disease-specific institute, as more of its funded projects center around disease, it will be important for it to foster collaborations with other NIH institutes that have "enormous investments and expertise in gathering, phenotyping, and characterizing samples," according to Felsenfeld. NHGRI is already undertaking such collaborations — notably the Cancer Genome Atlas — but there is a need for more, he said.
Finally, workshop participants advised that the institute "should address the technical problem of how to finish genomes using the new technologies," which have so far not produced "finished" genomes but draft assemblies.
Further recommendations came from three breakout sessions during the workshop, which focused on strategic planning for selecting projects, sample coordination, ELSI and consent; genome sequencing; and downstream issues in informatics and analysis.
One of the challenges of existing large-scale sequencing projects has been a "lack of availability of high quality samples," according to the report, which recommends that NHGRI encourages the creation of sample repositories, in collaboration with other NIH institutes.
In terms of the downstream analysis, the report points out that as the cost of producing sequence data decreases, the relative costs of analysis and informatics increase. Also, more generally, "the entire field of biology is still adapting to using and publishing papers on large data sets."
The report also raises the issue of data security and privacy, and suggests ways of controlled data access.
A Plethora of Possible Projects
During the genome sequencing breaktout session, participants put forward a variety of sequencing projects in different research areas that NHGRI should consider in the future, including human genetics, functional genomics, cancer, the human microbiome, and genome evolution and model organisms. The suggestions cover "the range of opportunities for compelling sequencing projects that could be done in the next five years, given the trajectory of the new sequencing capabilities," according to the report.
The report stressed, however, that it is unclear which of these, or other, projects will go forward and be funded in the future.
Among the human genetics projects proposed are an expansion of the 1000 Genomes Project to include rarer variants, more populations, and phenotyped samples; whole-genome sequencing studies to complement "or possibly supersede" GWAS studies for major common diseases; whole-genome sequencing of medically characterized populations in longitudinal studies; finding modifiers of highly penetrant disease variants, such as in cystic fibrosis; and identifying the causal variants of all approximately 7,000 Mendelian diseases.
[ pagebreak ]
In the category of functional genomics, recommendations included deep transcriptome sequencing projects, along the lines of the ongoing Genotype-Tissue Expression pilot project, which will sequence the transcriptomes of 50 human tissues from each of 160 donors; development of single-cell methods; and epigenetic analyses.
Cancer projects suggested include more full-genome sequences of tumor-normal pairs; transcriptome and epigenome analyses of tumors; and analyses of heritable cancers.
New human microbiome projects could include "many more normal subjects than is now being considered for the Human Microbiome Project," sequencing of host genomes, and analyses of microbiomes of model organisms.
Regarding “genome evolution and model organisms” projects, the report states that genome sequences from several non-human primates "will facilitate insight into recent human evolution" and population genetics of model organisms "is an important, relatively unexplored area."
The limiting factor for many of these projects, which constitute "a pretty good list," according to Felsenfeld, will not only be the availability of funding but also the ability to organize them, as many of them would involve large-scale collaborations of many players, and it is often not easy to acquire suitable samples.
It often takes time to acquire such samples, and "when you finally get them, they are never as 'sequence-ready' as you think they are at the beginning," he said.
Also, seemingly trivial aspects like developing a common format to display data can be "a big deal" and should not be considered anew for every single project.
"There are just tons of issues like that that come up in every large collaboration," he said. And because of their complex nature, "there is a natural limit to how many of those you can run effectively at one time with a given number of people."