The National Cancer Institute has begun accepting infrastructure proposals for the Cancer Genomics Cloud pilots, an NCI-funded initiative to build a more sustainable computing environment for accessing and analyzing genomic and related data from large-scale cancer research projects.
The agency issued a broad agency announcement (BAA) this week officially launching the proposal-collection phase of the pilots which will last for just over six weeks. Last week, it held a pre-proposal conference call and webcast during which project organizers discussed the vision for the project and guidelines for completing applications; and members of the academic and commercial communities gave feedback on a pre-released draft of the BAA and asked more fine-grained questions about expectations, crafting applications, and more.
The final document lays out in painstaking detail the research and technical objectives of the project as they relate to things like computational services, intellectual property rights, and access and security requirements as well as expectations for proposal applications including descriptions of relevant past experience, details of current project plans and how these support cancer research, and anticipated costs. It also covers submission guidelines, eligibility and review criteria, and a detailed development schedule that includes two design-and-build phases, as well as community testing and evaluation steps.
Persons planning to apply for one of a possible three cost-type research and development contracts are encouraged to submit letters of intent via email by Jan. 27 and turn in hard copies of their proposals by Feb. 27. The NCI estimates that it will spend approximately $20 million in total with an average of $6.67 million per approved project; however both of those figures and the number of projects that are eventually funded are subject to change. The list of eligible applicants includes federally funded research and development centers and government entities such as national laboratories and educational institutions, as well as small and large commercial businesses.
The architects of winning proposals will receive their awards on or about Sept. 12. Developers will then have six months to deliver early system designs for evaluation. Assuming a favorable assessment, they'll move on to the first of two nine-month phases during which they will have to complete and implement their systems. During the second nine-month period, both the NCI and the cancer research community will evaluate the finished products.
Spurred by a growing need for more sustainable, cost effective, and scalable means of accessing and using the information pouring in from NCI-funded projects, such as the Cancer Genome Atlas (TCGA) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET), the NCI's Board of Scientific Advisors and the National Cancer Advisory Board approved the proposal for the Cancer Genomics Cloud Pilots last year during a joint meeting of the boards.
They agreed to fund the development of a communal resource that provides co-located computational capacity and storage, and an application programming interface to connect software, data, and compute resources. Such a system, they reasoned, would simplify access to data from TCGA — which is expected to generate around 2.5 petabytes of information by the time it wraps — and other projects by overcoming barriers such as limited local compute and lean budgets. It also offers an alternative to the time-consuming and almost uniformly frustrating process of downloading large chunks of data to local resources.
Separate but related initiatives to fund other components make up the NCI's broader informatics strategy, which includes the cloud pilots. Nearly two years ago, the agency sponsored the development of the Cancer Genomics Hub (CGHub) awarding a $10.3 million contract to the University of California, Santa Cruz to build and oversee efforts to stockpile lower-lever sequence data from TCGA, TARGET, and other similar projects in a single repository. Recent estimates show that CGHub now has over a petabyte of information of which about 400 terabytes comes from TARGET's investigators and about 500 terabytes are from TCGA — both projects are still submitting data to the hub. Other projects contributing information include the Cancer Cell Line Encyclopedia and the Cancer Genome Characterization Initiative.
CGHub, which wraps up later this year, will eventually be absorbed into a larger NCI-funded resource called the Genomics Data Commons (GDC) — a separate effort to build what will essentially be a portal to diverse cancer genomics datasets that have been collated, checked, and cleaned — that will be tied to the cloud platform developed through the pilots. The NCI began accepting applications to build the GDC last November putting out a comprehensive request for proposals that covers technical and development objectives for the commons, proposal expectations, and more. It will continue to accept applications in response to the RFP until Jan. 24.