The National Cancer Institute at NIH is gearing up to build a bioinformatics infrastructure that will meet the needs of its diverse research community.
The NCI Center for Bioinformatics was announced in June but staffing has only taken place over the last few weeks. Funding negotiations are still underway with a level expected to be on the order of several million dollars, said Kenneth Buetow, director of the NCICB.
The NCICB component of each NCI research area will be directed by a full-time government employee. Beyond that, there will be different staffing models depending on the legacy system that is in place at individual research centers.
The focus of the new program is interoperability, including building data models, application programming interfaces, and web portals designed to provide a view of genomic data tailored to a particular research community’s needs.
“Ordinarily groups working on, for example, mouse models and clinical trials don’t interact that much. Each has a nearly insatiable need for bioinformatics support, which they’ve tended to address separately. But there is substantial overlap in their needs and synergistic possibilities in addressing them in an interoperable way,” Buetow said.
To ameliorate the problem, the NCICB is beginning with modules serving four NCI research programs: the cancer genome anatomy program, molecular signature of cancer, mouse models for human cancer, and clinical trials. However, the infrastructure being deployed is intended to be useful throughout the institute and beyond.
Members of the various NCI research communities felt it would be helpful to have an NCI-wide effort to set common standards, Buetow said.
NCI is involved in the NIH-wide BISTI (Biomedical Information, Science, and Technology Initiative) consortium, which brings together investigator-initiated efforts in bioinformatics. Buetow anticipates that some of the tools developed through BISTI research may be distributed through the NCI infrastructure.
The NCI team is also taking part in the effort spearheaded by the industry group BIO to put together an XML standard for biological data exchange.
“We’ll be focusing on developing useful standards as opposed to requiring standardization,” Buetow said. “One of the ways you get convergence on standards is to give people stuff they can use.”
The NCICB is using Apache web servers, the Zope web interface, and Perl. XML standards, APIs, and data models will be freely available and all software will be open source. This will serve as common middleware upon which the web views specific to each research community can be built.
The center plans to rapidly deploy web portals and tools to facilitate data sharing and software among groups. “Experience has shown if data doesn’t get shared quickly, it may not get out at all,” Buetow said.
The portals will provide access to a wide variety of data, much of which is maintained at NCBI. Although NCI is negotiating with Celera for access to its data for NCI scientists, they do not anticipate being able to make it available through the NCI portals because of Celera’s redistribution restrictions.
However, said Buetow, Celera, Incyte, and many other commercial firms are participating in discussions with the BIO interoperability group, although no commitments have been made yet.
“My interactions with Celera and Incyte have been very positive,”Buetow said.
Buetow said that commercial and academic partners are welcome to participate in the NCICB effort. He sees the government’s role as providing an open infrastructure that facilitates, rather than obstructs, commercialization.
“I can think of no better compliment than if we deploy something that winds up as the basis for a wildly successful product,” he said.