New Jersey-Robert Wood Johnson Medical School
Center for Biomedical Imaging & Informatics, Cancer Institute of New Jersey
Name: David Foran
Position: Professor, pathology, laboratory medicine and radiology, University of Medicine and Dentistry of New Jersey-Robert Wood Johnson Medical School; director, Center for Biomedical Imaging & Informatics, the Cancer Institute of New Jersey, 2006 to present
Background: Research associate, pathology and laboratory medicine, University of Medicine and Dentistry of New Jersey-Robert Wood Johnson Medical School, 1992 to1993; PhD biomedical engineering and computer science, awarded jointly from the University of Medicine and Dentistry of New Jersey and Rutgers, The State University of New Jersey, 1992
Name: Joel Saltz
Position: Professor and chair, Department of Biomedical Informatics, Ohio State University, College of Medicine and Public Health, 2001 to present; professor, Department of Computer and Information Science, Ohio State University, 2001 to present; investigator, Dorothy M. Davis heart and Lung Research Institute, Ohio State University Medical Center, 2004 to present
Background: Professor, Department of Pathology, Johns Hopkins Medical School, 1999 to 2001; PhD, computer science, Duke University
The Cancer Institute of New Jersey and Rutgers University, The State University of New Jersey recently announced a collaboration to develop cancer diagnostic tools. The effort is an extension of IBM’s “Help Defeat Cancer” project, in which CINJ used IBM’s World Community Grid to characterize different types and stages of cancer based upon “the underlying staining patterns exhibited by digitally imaged cancer tissues,” IBM said in a statement.
Professor and chair
Ohio State University
CINJ recently received a $2.5 million, four-year grant from the National Institutes of Health to continue its work, as well as a Shared University Research grant from IBM providing computer equipment and assistance in developing pattern recognition algorithms that can take data generated by various types of fields of research, including proteomics, and make them accessible to researchers across different research fields using different computer systems.
CINJ’s David Foran is leading the effort, while Ohio State University’s Joel Saltzis leading the development of the software architecture for management, query, and analysis of tissue microarray and virtual slide data.
Below is an edited version of a recent conversation ProteoMonitor had with the two about their work.
Describe the project you’re working on.
DF: As you know the difference between a gene array and a tissue microarray is the fact that in tissue array, you’re not looking at a homogeneous spot. You’re looking at something that’s heterogeneous in terms of the tissues and cells that are present.
For scoring the level of immunostain expression within these specimens, it’s generally done by a pathologist, and this is usually done manually. They eyeball it.
Within our own institution, of all of the various types of sub-specializations in pathology, the one area that all pathologists were willing to give up as their turf were tissue microarrays because they’re so complicated to read, and it’s so difficult to get reproducible measurements.
Recognizing this, we wrote a grant for [IBM’s] World Community Grid and we asked them if we could borrow some of their computational power. At the time, we weren’t really aware of how much computational power they would be able to afford us.
It turns out it was about the equivalent of 137 years of computation per day. All of a sudden we were in a position where we had the opposite problem of what we usually have. Usually people tell us, ‘Please try to keep this computationally non-expensive because there’s only so much computation to go around.’
And IBM said, ‘Let your dreams run wild. You can use any types of pattern recognition techniques you like. We’ve got the computational power to back it up.’
We had an idea that we would be able to generate spectral and special signatures representing the expression patterns within a number of tissue disks [that] were taken from patients with either various types of breast cancer, colon cancer, and head and neck cancers.
And in total, there were 100,000 specimens that we sent out to the grid. The IBM people grid-enabled all of our software, and we ended up with a reference library of expression patterns, which were captured using our algorithms, and we basically had this repository of signatures.
The fact that everything we put under study was retrospectively scored — that is to say we already knew what the diagnosis was, we already knew what the histologic type was, the tumor grade, ecetera — we were in a position to examine whether or not there were correlations between those signatures which we had generated and those various classifications.
And we have been able to demonstrate that, in fact, we can sub-classify various types of breast cancer based upon these signatures, and the values that we generate in this reference library are completely reproducible and they’re not spatially constrained. That is to say, you could look at just the tumorous region within the specimen and the reference library would still be applicable. You could look at the expression pattern across the entire tissue disk and those same classifications would hold.
We took that proof of concept and [with Joel Saltz] … wrote a grant to the National Institutes of Health where we proposed to expand the reference library and build a grid-enabled clinical decision support system around it.
You said you have these signatures. Are you talking about protein spots?
DF: Yes, we’re talking about immunostains. We could look at any type of biomarker, actually. The system is being developed so that it’s generalizable.
JS: So it’s basically looking at one or a small number of markers, at a time, often proteins, which are associated with tissue staining. So you’re not just looking at the expression in some purified sample. You’re looking at it in tissue.
As you’re moving forward, what sort of technology are you looking at? Are you still using tissue arrays, or are you incorporating other things such as mass specs or 2D gels?
DF: It’s only tissue arrays at this point.
JS: This is actually a place where the [cancer Biomedical Informatics Grid] story comes in. caBIG is a National Cancer Institute program dedicated to developing computer methods and data management systems and databases and so-called grid technology that make it possible to integrate data of different sorts. So a clinical study may involve both outcome data for a patient that you follow, but increasingly during the course of a clinical study, one would take into account and analyze pathology data, tissue microarray data, in many cases, genomic and proteomic data, in order to evaluate the efficacy of a given treatment [and] compare one treatment to another.
You get all of this data, both to get more precise subsets of patients — you get patients who are more homogeneous, who really have diseases that are at a molecular level as comparable as possible — and also one does these analyses during the course of therapy to understand at a molecular level what the therapy is actually doing, rather than just asking, ‘Did it work, or did it not?’
Because of that, there’s a need to do clinical trials that also involved mass spec data, high-throughput gene expression data, high-throughput sequence data, epigenetic data, tissue and radiology CT and MR data.
The goal of the caBIG project is to develop tools in a framework to integrate data so that when we actually work with collaborators to use tissue microarrays in the context of clinical studies or analyze their significance, we can easily access mass-spec data or 2D gel-electrophoresis data, or gene-expression data.
So a key part of this project is to develop the data-management and -analysis system that David talked about in a way that interoperates with data management systems for things like mass specs and 2D electrophoresis, high-throughput molecular [technologies].
Our particular niche in this is … we’re the lead developers and the initiators of much of the technology in the backbone, the so-called caGrid infrastructure. That’s the glue that links together the different types of databases and makes it possible to access high-end computers remotely.
There’s a strong link between the IBM project and the caGrid effort, and actually IBM has involvement in the caBIG effort.
So the goal of the proposal that we’ve just been awarded is not only to continue to develop algorithms to classify and analyze and correlate the tissue microarray data, but to make those facilities available to other cancer researchers who deal with imaging, proteomics and genomic studies, and clinical studies.
DF: IBM’s ‘Help Defeat Cancer’ project enabled us to establish the reference library, and based upon that preliminary information, Joel and I wrote this grant to NIH … and in addition to that, I wrote a separate grant for an IBM Shared University Research grant, and that has been awarded.
That grant is a grant in which I’m collaborating with investigators at the [IBM Thomas J. Watson Research Center in Yorktown Heights, NY]. And there, the emphasis is multi-modality analysis across genomic information all the way through radiology studies.
Going back to the retrospective study you mentioned before, what were you able to find?
DF: We ended up with a 4,000-dimensional vector for every single image for all 100,000 tissue disks, and we wanted to find out which of those feature measurements contributed best to correct classification of various types of breast cancer. That was the first subset that we looked at.
What we realized was we had to perform some data reduction, and so we used an isomap approach in order to knock it down to about 500 dimensions.
We were able to distinguish first between normal tissue and abnormal tissue with 89 percent accuracy and we were able to distinguish among four different sets of breast cancer with 85 percent correct classification.
The big question for us when we started the project for IBM was, ‘What was going to come back? Would there be any correlation whatsoever?’
We suspected there would be, based upon some small, preliminary studies that we did, but until we had this large reference library in hand, it was impossible for us to do anything that was statistically significant.
Who’s going to contribute to this grid? Are you limiting it to specific groups, or will it be open to any researcher?
DF: The way that we intend to roll it out is that currently we have a consortium [that] consists of the Cancer Institute of New Jersey, the University of Pennsylvania School of Medicine, Ohio State Medical Center, and Arizona State University.
That would be kind of our test bed, and of course, we will be deploying new versions of software on a regular basis to those sites over the course of the first year and a half or so. And then we’ll begin to open it up to the research and clinical communities.
JS: This is a key point of the caBIG, caGrid connection because this infrastructure has been built and is actually now operational from a technical point of view within even our initial period, people are authorized to access the data, or who want to access the algorithms, will be able to develop their own programs that use our infrastructure because the whole point of caBIG is to develop a collection of databases and algorithms, computations that work together.
Who’s going to be using this information?
DF: Currently, what we’re targeting is that it would be used for therapy planning, being able to select sub-populations of patients for clinical trials and ultimately for drug design and discovery.
It sounds like it’s primarily biotech and pharma.
DF: You will also see the oncologists within these comprehensive cancer center locations utilizing it. In fact, there are a number of people here at the Cancer Institute of New Jersey who are already utilizing the prototype system that I have in place.
Most of these oncologists at the comprehensive cancer centers are hybrids. They see patients, but they are also scientists and so that would be the first group of individuals who I would see benefiting from this.
And, of course, those individuals would in turn give information to the pharmaceutical companies, which would enable them to do the drug design.