Bioinformaticists in Pennsylvania are reaping the rewards of the state’s tobacco settlement money. Just months after the state launched its Life Sciences Greenhouse economic development initiative with $100 million in tobacco settlement funding, another $12.3 million of settlement money was awarded last week to bioinformatics-based cancer research projects.
In all, the state has distributed $65.1 million in tobacco settlement money this fiscal year to biological research projects (see list on p. 10). The most recent awards were granted with the hope of bolstering the state’s bioinformatics infrastructure to support collaborative research.
The primary beneficiary of last week’s award was the Pennsylvania Cancer Alliance (PCA), a consortium of six cancer research institutes that was granted $5.5 million to build a distributed bioinformatics system to store and exchange biomarker analysis data. In addition, the Allegheny-Singer Research Institute and Carnegie-Mellon University were granted $3.2 million to combine imaging technologies with data analysis techniques, and Carnegie-Mellon and Dickinson College were awarded $3.5 million to develop medical diagnostics for cancer identification.
The PCA project is the largest in scope, with the greatest potential impact on researchers across the state. Led by researchers at the University of Pittsburgh Cancer Institute, the consortium includes Fox Chase Cancer Center, Kimmel Cancer Center at Thomas Jefferson University, Abramson Cancer Center of the University of Pennsylvania, Penn State Cancer Institute, and The Wistar Institute.
Michael Becich, co-principal investigator on the project and director of the University of Pittsburgh Cancer Institute’s Benedum Oncology Informatics Center, said that Ken Buetow, the director of the NCI center for bioinformatics, is advising the PCA on infrastructure design. The consortium plans to base the system on the NCICB’s caCORE cancer informatics infrastructure, which was developed to integrate controlled vocabularies, ontologies, common data elements, and UML models of biomedical objects into a common system. Said Becich, “$5.5 million isn’t a lot of money when you have six sites and you want to build a common infrastructure, so as many tools that we can use from [the NCICB], we will put in place.”
In addition, Becich’s group at UPCI has built a tissue banking information system that it intends to share with PCA members, and other cancer centers are expected to “proliferate” existing software and technology components throughout the consortium.
While the mechanics of the infrastructure are still being finalized, Becich said the plan is to keep patient samples at their home institutions, while de-identified data and marker information will be shared across the system. Collaborators will have access to all the materials at member institutions, but a common repository for samples is not planned. “The biorepository aspect is virtual if you will. Specimens will be identified by the principal investigators at each of the sites, they will do the biomarker analysis on those samples, and then that data will be shared in the bioinformatics repository,” Becich explained. The next step is creating a distributed system to mine the data so that it appears like a single database to the user.
The key to that aspect of the project is what Becich described as “grunt work” — working out terminologies, vocabularies, and metadata tagging structures — “but that’s what the new challenges are about: How do people share data transparently?” Becich estimated it could take up to a year to finalize the metadata modeling aspects of the bioinformatics strategy, with parallel efforts planned on aggregating specimens and the biomarker analysis.
Benefits in Store for Industry, Too
The project could be a potential draw for biotech and pharma companies interested in accessing the data. Pennsylvania’s three Life Sciences Greenhouses provided letters of support for the project and have agreed to help the PCA find opportunities for commercial entities to capitalize on aspects of the project. While potential commercialization opportunities are still vague, Becich provided a possible scenario: “Maybe a commercial partner would provide the financial support for running cDNA chips on all of the patient samples we aggregated for the entire study, and then we would have a really highly richly annotated set of data on clinical trials. In an ideal world, we’d be sharing the data, and it would help us expand the number of markers we actually study to include as many as thousands of markers instead of dozens.”
Becich added that the results of such an effort would offer a rich data-mining environment to learn new things about patients, their cancers, and the effects of those agents on those cancers. “So from the standpoint of drug discovery and drug validation, it would be a tremendous win for companies doing work in the state of Pennsylvania,” he said.
Donald Smith, university strategist for the Pittsburgh Life Sciences Greenhouse, confirmed the Greenhouse’s support of the project, but was unable to provide further details on commercialization plans. He did note that bioinformatics would play a “strong and central role” in each of the three Greenhouse initiatives.