A research team at the Cancer Institute of New Jersey is spearheading a collaborative effort to develop a system for analyzing tissue microarrays that could eventually be used by the broader cancer research community as part of the National Cancer Institute’s Cancer Biomedical Informatics Grid, or caBIG, infrastructure.
David Foran, director of the Center for Biomedical Imaging and Informatics at CINJ and the lead investigator on the project, told BioInform that a prototype of the system should be available in about a year and a half and that a “complete clinical decision support system framework” will be available through caBIG by 2012.
The National Institutes of Health recently awarded Foran’s team a four-year, $2.5 million grant to develop the platform, which will allow researchers to compare and analyze expression patterns in cancer tissue microarrays via caBIG’s underlying service-oriented infrastructure, called caGrid.
Last week, CINJ said that IBM has donated a P6 570 series server to the project under its Shared University Research Award program. A team of researchers at IBM’s T.J. Watson Research Center will also collaborate with Foran’s team to develop image-analysis and pattern-recognition algorithms for analyzing the tissue microarrays.
Other partners in the project include Arizona State University, Columbia University, Ohio State University, Rutgers University, the University of Medicine and Dentistry of New Jersey, and the University of Pennsylvania School of Medicine.
According to the abstract for the recent NIH award, the proposed analysis platform will include a web-based image-guided decision-support system, a distributed telemicroscopy system, a virtual microscopy system, and an image-archival system, and will be based on grid middleware components developed under caBIG’s in vivo imaging workspace. The system will allow researchers to compare a single tissue microarray image against a large reference database and retrieve those expression patterns that are the most similar to the query image.
The system currently is not intended for diagnosis, Foran said, but is envisioned as “being used more for therapy planning for patients, being able to identify sub-populations of patients for clinical trials, and of course drug design and discovery going forward.”
The goal, he explained, is to create a framework that will enable “oncologists and pathologists from anywhere in the country to essentially come in through a URL,” and analyze their tissue microarrays. He said that this would be possible through “two different scenarios:” one in which a researcher would upload an imaged tissue array to a website, where it would be distributed across the grid and analyzed, and another in which the researcher would retain the imaged tissue array locally.
“All of the automated delineation of the disks and color decomposition and packaging into work units would actually occur at their client end,” Foran said, adding that those work units would be distributed for analysis via the grid.
Foran said that a grid approach is expected to “significantly” reduce the amount of computational time required to analyze a typical tissue microarray, but noted that he and his colleagues have not yet completed any benchmarking to determine the exact time savings. “Currently, on a standard computer, the analysis that we’re talking about would take many hours,” he said.
Leiguang Gong, a researcher in IBM’s high-performance rich media-analytics group who is collaborating on the project, told BioInform via e-mail that generally, “a dedicated supercomputer provides superior computing power for real-time in-house data processing and visualization, but a grid system is universally more cost-effective for batch-based distributed data processing.”
“It’s not only important to take the measurements in terms of intensity and distribution, but we also have to use sophisticated machine vision technology in order to teach the computer to be able to distinguish or localize the protein.”
Foran stressed that the computational requirements for analyzing tissue microarrays are orders of magnitude beyond that of DNA arrays because the image processing is so much more complex.
“If you look at gene-array analysis, each one of those little spots is a homogeneous spot, which you can represent with a single value,” he said. “But these are tissue microarrays, and this is a heterogeneous tissue. So in the case of, let’s say, breast cancer, the computer first has to be trained to be able to distinguish between epithelial regions, stromal regions, [and] different cell types because what we’ve been finding more and more is that there is actually clinical relevance as to where the protein preferentially resides.
“So it’s not only important to take the measurements in terms of intensity and distribution, but we also have to use sophisticated machine-vision technology in order to teach the computer to be able to distinguish or localize the protein,” he said.
IBM’s Gong said that his group is working with the CINJ team to develop a “unified suite of algorithms” for analyzing large-scale, multimodal cancer patient data, including image analysis and registration techniques, data-mining algorithms, and classification methods.
“These methods together will enable a multi-modality decision-support system for assessing and managing cancer that employs an automated, evidence-based approach for systematically evaluating clinical, genomic, and imaging data,” he said.
Joel Saltz, chair of the department of biomedical informatics at Ohio State University, said that the project should also benefit from a new IBM-based biomedical cluster that the Ohio Supercomputing Center is building, which will include “several hundred processors” and run the caGrid software stack.
Grid-Based Proof of Concept
Foran said that the current project grew out of a proof-of-concept study that used IBM’s World Community Grid — a network of nearly 300,000 computers — to validate imaging and pattern-recognition algorithms for analyzing tissue arrays.
The project, called “Help Defeat Cancer,” kicked off last July and was allocated 137 years of computation per day, Foran said [BioInform 07-21-06].
“We really wanted to make sure we didn’t fall on our face, but at the time it was just a hypothesis that we, with our statistical pattern-recognition and machine-vision algorithms, could capture a signature of the expression patterns within tissue arrays that correlated with different stages of disease and different types of disease — specifically breast cancer, head and neck cancer, and colon cancer,” he said.
In order to test the hypothesis, Foran’s group worked with IBM to “grid-enable” its software, and researchers then imaged 100,000 tissue disks from different types of cancer.
“Because all of those tissues came from retrospective studies, we had the luxury of already knowing what the diagnosis of record was, the tumor type, the histologic grade, et cetera,” Foran said, which enabled his team to gauge how well the algorithms could correlate protein-expression signatures with various stages of disease.
“The first subset that we looked at was breast cancer, and we demonstrated that we were able to not only tell the difference between normal and abnormal, or tumor and non-tumor regions, but we were also able to subclassify breast cancers into four different categories,” he said.
Under the expanded project, Foran, Saltz, and colleagues plan to expand the reference library to include signatures for a wider range of cancers. In addition, because the system will be interoperable with the caBIG infrastructure, “other groups will be very well-positioned and welcome to add their own data,” Saltz said. “Any group that wants to expose a tissue microarray library to the NCI community will be able to do that.”
Saltz said that the project will also tap into other tools being developed in caBIG. For example, he said that the project will support caBIG’s semantic annotation tools, which will enable researchers to integrate tissue microarray data with information from clinical trials, radiology studies, and other molecular studies.
“Basically this is a mechanism of integrating this particular area of expertise into the broader caBIG framework so that people can include caBIG-compliant tissue microarray data and analysis and results and analytic functions into broader caBIG applications,” he said.