BETHESDA, Md. — The National Cancer Institute's caBIG (Cancer Biomedical Informatics Grid) project took a moment to celebrate the success of its first year at its second annual meeting held here last week, but not for long — caBIG's coordinators are already making plans to broaden the scope of the project, while sticking to an aggressive timeline to deliver a nationwide interoperable cancer informatics framework by 2010.
Ken Buetow, director of the NCI Center for Bioinformatics and coordinator of the caBIG program, said that the project will expand its efforts over the next year to include several new domain areas, such as proteomics, imaging, and systems biology, and that the initiative will also take steps to attract participation from commercial vendors.
Addressing an audience of nearly 400 people at the meeting, Buetow said that last April, caBIG was only "a vision." One year later, the project — which aims to link computational resources and research data across more than 50 NCI-supported cancer centers — is "where we figured we should be at the completion of the first year."
So far, caBIG participants — more than 600 people hailing from around 80 different organizations — have focused on building informatics tools and data standards for three core domain areas: clinical trials management, tissue banking, and integrative cancer research. Buetow said that several tools from these development efforts are already available, or in prototype form (see story, this issue, for more information on the caBIG tool roadmap).
In its second year, Buetow said, the caBIG community needs to concentrate on what he called "the biggest upcoming challenge" for the project: assembling those components into a seamless framework using the standards and grid architecture developed under caBIG's "cross-cutting" working groups.
But even as these existing tools are being knitted together into the larger infrastructure, more pieces will be coming online. In addition to the new domain areas, Buetow said that caBIG will work with a number of new partners to extend its reach into — and beyond — the cancer research community. One such partner is the Food and Drug Administration, with whom caBIG will develop and pilot a "common infrastructure" that will support the regulatory requirements associated with the development of cancer drugs.
During a session on the second day of the meeting here, Randy Levin, director of health and regulatory data standards at the FDA, said that the two agencies are already working on methods for filing electronic submissions of INDs via the caBIG infrastructure. NCI is also working with the UK's National Cancer Research Institute to synchronize informatics development across the Atlantic.
Finally, Buetow said, caBIG will invite the commercial sector to become a "full partner" in moving the caBIG architecture forward. The project has so far been led by the academic community, with primary contributions from the NCI and the member cancer centers. However, Buetow and other caBIG officials said that there has been a great deal of interest from both the vendor community and the user community in identifying mechanisms by which commercial tools can be integrated into the framework.
Creating a Commercial Market
Details of how NCI plans to work with the commercial sector are still under discussion, however. Chalk Dawson, senior associate at Booz Allen Hamilton, NCI's primary contractor for the project, told BioInform that caBIG coordinators are still debating ways to make vendor participation a "win-win" for industry as well as academic and government participants. The caBIG management team plans to hash out some potential mechanisms for commercial participation over the summer, and caBIG will host a "vendor-oriented" forum in September where it will formally disclose its plans for attracting industry involvement.
The issues under discussion are not trivial. One possible barrier to commercial participation is caBIG's strict policy of "open source, open access, open development, and federation." All caBIG contractors and subcontractors must abide by this over-arching philosophy in order to get funded, and a number of meeting attendees suggested that this would likely discourage most commercial software and hardware vendors from participating.
In addition, as one participant noted, some academic caBIG participants may not welcome vendors with open arms, viewing the typical informatics business model as "stealing our stuff and then charging us to get it back."
But the small number of commercial informatics providers who were at the meeting — admittedly a biased sample — welcomed the proposal. Diane Oliver of health IT firm Cerner noted during a Q&A session following Buetow's talk that her company is "very interested" in the data standards coming out of caBIG. She added that Cerner has developed its own freely available terminology of around 5,000 genes and polymorphisms used in the clinical setting [BioInform 03-07-05], and welcomed feedback from the meeting attendees.
David Aronow, manager of clinical informatics at biorepository data-management firm Ardais, told BioInform that his company has already been involved in caBIG on a "volunteer" basis for about a year. When the project was first proposed, Aronow said that Ardais viewed it as having "the potential to expand the market for what we already wanted to be doing" by "creating a market for commercial-grade applications" that are compatible with the caBIG framework. Rather than view the publicly funded project as a potential competitor, Aronow said the company decided "it would be a risk if we didn't embrace the caBIG movement wholeheartedly."
One benefit of participating in the project, according to Aronow, is that Ardais has a hand in directly shaping the data standards and vocabularies that caBIG eventually endorses. Ardais plans to donate some of the terminology it has developed internally for its tissue bank operations to caBIG, but Aronow said that a "mechanism for doing that is not clear yet."
Aronow said that Ardais has also submitted several joint proposals with caBIG cancer centers for projects involving vocabulary development and software development. The open source requirement isn't a barrier, Aronow said, "because if we can develop these applications that work, it will really create an environment where our proprietary applications and caBIG applications work better together, since we have contributed to the caBIG applications ourselves."
This sentiment echoed Buetow's own arguments for inviting industry participation. He noted that many caBIG "products" aren't software tools as such, but are actually open interfaces that could be used to integrate proprietary software into the larger framework. In addition, he said, NCI envisions that caBIG will ultimately provide a foundation for "new business models built upon the support and maintenance and extension of caBIG-developed tools."
Peter Covitz, director of bioinformatics core infrastructure at NCICB, agreed, describing caBIG's activities as "creating the market and seeding it." During a session on the second day of the meeting, he said that caBIG has elements of "socialism, capitalism, altruism, and egoism," and that all of those ingredients — in the proper balance — would be required to ensure the project's success.
Another company that has expressed interest in aligning with caBIG's activities is Akaza Research, a Cambridge, Mass.-based startup developing an open source clinical data-management platform called OpenClinica. However, during a breakout session on caBIG's open source software licensing policy, Akaza co-founder and CSO Nitin Sawhney expressed some concern that the product, despite its open source nature, may not be compatible with caBIG's guidelines, which rule out so-called "viral" licenses like the GPL (see story, this issue, for more details on caBIG's proposed open source software license model). OpenClinica is licensed under the LGPL, which complies with some — but not all — of caBIG's requirements. "How can we harmonize with caBIG software?" Sawhney asked.
Following the meeting, Sawhney told BioInform that he plans to become more involved in caBIG's Data Sharing and Intellectual Capital working group in order to ensure that the group's IP policies — which are still evolving — align with industry interests.
Some firms providing informatics services are already active participants in caBIG, and companies like SAIC, SRA, Research Triangle Institute, and others have been hired as subcontractors on some projects. In the longer term, the services model is promising to companies such as Ardais, even if they don't get funded as subcontractors, Aronow said, because, as the caBIG toolkit grows, smaller cancer centers that don't have the IT resources of the current member centers may want to tap into outside expertise to integrate caBIG tools with their internally developed infrastructures.
Another area that might draw significant vendor interest is the new imaging working group. Dan Sullivan, head of NCI's cancer imaging program, said that the primary vendors in the imaging marketplace — Siemens, Phillips, and GE Healthcare — all have proprietary software platforms. The development of cross-platform workstations is one task that the imaging working group hopes to accomplish as part of caBIG, Sullivan said, and he added that industry is "very interested in participating" in this effort. The imaging group plans to issue a request for proposals in May for contractual participants, with project selection slated for June and the first in-person meeting expected in the fall.
That's not to say that there's no resistance to vendor involvement in caBIG. One of the initiative's more well-developed projects, the caArray database, was criticized during a session as "not open source" because it is built on the Oracle database. A caArray project manager told BioInform after the session that several adopters of the database are currently modifying caArray to work with open source alternatives to Oracle, like PostgreSQL or MySQL. Once these groups successfully port the database to these platforms, the code will be made available as part of the caArray project.
One of these projects, a collaboration between Georgetown University and Thomas Jefferson University's Kimmel Cancer Center, presented a poster at the meeting. The researchers reported that they successfully extracted caArray's data structure using a tool called Torque, but future steps were still unclear as the team had difficulty extracting seed data from caArray using the tool. The researchers noted that the work, while still in its early phases, should be applicable to both PostgreSQL and MySQL.
This type of project — as an independent offshoot of the original caArray project — typifies the dynamic that the caBIG coordinators are trying to foster, which Buetow described alternately during the course of the meeting as "think globally, act locally," and "caBIG is us." The distributed nature of the project can at times appear to be a bit out of control, Buetow told BioInform after the meeting, but it offers developers the opportunity — and the responsibility — to actively address shortcomings in caBIG themselves, rather than wait for NCI, or some other centralized resource, to do it for them.