NEW BRUNSWICK, NJ — Fortifying its research efforts in bioinformatics, the Cancer Institute of New Jersey has launched a new biomedical informatics program to be led by Gunaretnam Rajagopal, the founding executive director of the Bioinformatics Institute at Singapore’s Biopolis.
At an informational event sponsored by the Robert Wood Johnson Medical School and the Cancer Institute of New Jersey this week, Rajagopal and his colleague David Foran, director of the Cancer Institute’s Center for Biomedical Imaging Informatics, outlined several new projects, emphasizing novel approaches to software development, grid computing, and personalized medicine.
Rajagopal, who serves as the institute’s executive director in bioinformatics within its new cancer informatics core, has already launched his first pilot project: a data-integration initiative called PopWeb. The idea is over the next four years to build a statewide data warehouse and bio-specimen repository that will marry genomic and clinical data in order to predict specific treatment options for cancer patients.
PopWeb will be linked to an existing cyberinfrastruture called the NJEdge network, a broadband network of academic and research institutions in New Jersey. It will also be integrated into the National Cancer Institute’s biomedical informatics grid, caBIG.
During a presentation, Rajagopal said that PopWeb will help facilitate translational research, including the development of biomarkers in collaboration with industry partners. The project is expected to contribute to integrative research efforts in cancer, stem cell biology, and regenerative medicine. It will also involve collaboration with population sciences colleagues at various institutions on issues related to cancer prevention, control, treatment, survivorship, as well as economic and racial disparity issues.
The project entails deploying scalable computational and storage resources such as cluster computers and database servers. There will also be web-based services, multimedia tools, and software engineering expertise to cater to the large-scale data collection, storage, annotation, integration, and mining of research and clinical data.
This system with an integrated populations sciences and bio/clinical informatics platform will be maintained by Cancer Informatics Core team and will be set up to be in line with regulations such as the Health Insurance Portability and Accountability Act, or HIPAA.
A new three-year grant from the National Cancer Institute awarded to Rajagopal is intended to be used for system architecture to assure data access, patient privacy, and platform interoperability with the National Health Information Network standards established by NCI’s initiative Cancer Biomedical Informatics Grid, caBIG.
This platform will link with other clinical informatics projects in place at the Robert Wood Johnson Medical School. Rajagopal is expanding on these efforts with partnership discussions with Microsoft about electronic medical records, and with Merck and Johnson & Johnson on functional genomics, systems biology, and cancer biomarker discovery. These projects will also involve collaboration with colleagues at Rutgers and Princeton Universities and the Simons Center for Systems Biology at the Princeton-based Institute for Advanced Study.
PopWeb is the development and deployment of web-based data-collection tools and services linking physicians and patients within the New Jersey Family Physicians Research Network (NJFPRN) as well as researchers at CINJ. The goal is to put in place informatics capabilities to support prospective cohort studies. That means the system must allow collection, annotation and analysis/mining of population-wide data using a federated database framework.
According to the project’s proposal, it will have an interface with the Cancer Institute of New Jersey’s specimen collections in order to integrate clinical data such as gene and protein expression or genotyping data on collected samples, all of which will help to treat patients. PopWeb will also involve software development to scan the clinical history of CINJ patients and enable epidemiological and correlative studies.
In a conversation with BioInform, Rajagopal pointed out that this project will require infrastructure investment by the university and state. “We spent at least $500 million on developing and deploying the cyber-infrastructure and services for Biopolis,” he said. “This was done from scratch.” In New Jersey many such capabilities are already in place.
“We need to invest in the region of $50-100M to really bring things to a level that will really make a difference,” he said.
“Heavy computing is very expensive business,” he said. “I used to have to pay electricity bills of $1 million a month for my data center in Singapore.” He said that pharmaceutical companies came to the Singapore institute to access this computing power, as well as expertise — a model he’d like to replicate in New Jersey.
“Singapore has a late-comer advantage,” he said, explaining that all buildings, equipment, and infrastructure at Biopolis went on line in around 2003 and were built in less than two years, allowing researchers to exploit the latest technology. Rajagopal set up the bioinformatics institute in Biopolis in the summer of 2001.
“Singapore has a late-comer advantage … On the other hand, New Jersey is the Silicon Valley for the pharmaceutical industry in the United States, with a concentration of excellence that is incredible … Over a period of time, we will catch up.”
“On the other hand, New Jersey is the Silicon Valley for the pharmaceutical industry in the United States, with a concentration of excellence that is incredible,” he said. So now his plan is to computationally link industry, academia, and government. “Over a period of time, we will catch up,” he said.
“New Jersey is a unique microcosm of the United States population — in fact, the world population,” said Rajagopal, noting that the state, patients, healthcare providers, and the pharmaceutical industry should benefit from access to the data.
Although researchers in the state already have access to NJEdge and the Internet2 research computing network, “it is a matter of making sure that, for example, physicians connect to Internet2,” he said. That will entail convincing them they can expect a return on investment financially as well as a gain in the way they provide healthcare, share costs, and expertise.
Rajagopal plans to work with the state authorities and IT teams at NJEdge to link all relevant groups to Internet2 in order to take advantage of the computing power being set up by the NSF-supported TERAGRID. He also seeks to cooperate with IT partners within the 16 hospitals forming the New Jersey Family Physicians Research Network, NJFPRN. That network infrastructure must be upgraded to allow data and information sharing. That is the type of networked environment that he envisions will facilitate collaboration in research, education, and treatment with scientists and clinicians in the CINJ community.
Rajagopal said that William Hait, former director of the Cancer Institute, and Arnold Levine, director of the Simons Center for Systems Biology at the private Institute for Advanced Study in Princeton, were instrumental in luring him to New Jersey. On a weekly basis, Rajagopal and Levine brainstorm on theoretical and mathematical modeling challenges fundamental to biological questions, most recently, for example, on the multitude of factors affecting cancer and aging.
“Computation is only as good as the questions you ask,” said Rajagopal. The genome contains many subtleties that “confound people like me,” he said, so efforts are needed to match models to the complex biological realities. For example, redundancy in biology makes writing algorithms difficult, he said. “The trouble is a lot of the algorithms in biology are heuristic — you learn from datasets, identify certain patterns, and then you match with real data.”
But biological datasets show huge sample variation. While the central limit theorem in statistics states that as a sample size increases, a sample distribution can approach normal distribution, “in biology there is no central limit theorem,” he said. “That is the annoying thing, every sample is unique.”
In physics, optimizing parameters can solve many problems. “Unfortunately, biology, through evolution of 3 billion years, doesn’t necessarily choose the optimum solution or the best solution. It chooses a working solution and moves on,” he said. Also, biological data presents substantial signal-to-noise problems. All these factors mean new computational techniques are needed that fit biological complexity.
Speaking during the event, the center’s Foran presented examples of these challenges, and emphasized issues associated with the computational analysis of variable specimens.
In collaboration with Rutgers University, Ohio State Medical Center, and the University of Pennsylvania, Foran and his group have developed PathMiner, an automated image-guided system for pathology decision-making. It integrates image analysis of pathology specimens, robotics, a telemicroscopy system, and database management.
Written in Java, the system lets a user remotely examine a specimen in a robotic microscope. Users can enhance edges, suppress noise, and control light levels from a remote location, and then compare the images to samples in a database.
This technology is about to be put to its first clinical test by a transplant pathologist at the center who Foran said has often needed to race to the hospital in the middle of the night to check an organ prior to transplant. “Now we will put the sample in place and she can do the read from home,” Foran said.
The PathMiner system relies on content-based image retrieval, which involves automatically locating and retrieving graphical information using only visual content, as opposed to alphanumeric labels. For a given sample, a computer generates a spatial and spectral signature for the underlying pathology, formulates a query vector, sends it across the Internet for comparison with a database of previously diagnosed cases, and then, based on probability, provides the most likely diagnosis, he explained.
One tough nut to crack, Foran told BioInform, has been the need to develop and refine a spectral decomposition algorithm for pattern recognition. Images from the microscope are digitized and a comprehensive spatial map of the slide is created. Then the specimen is visually dissected and a spectral and spatial signature for each cell is created and matched with images from other specimens in databases.
One challenge, he said, is that that immunohistochemistry images for different specimens, even under the same conditions, “will have slightly different characteristics, so you can’t use a pre-canned program.” In order to address this issue, he explained that for each sample, PathMiner’s algorithm sub-samples half a million pixels and performs a polar transformation and regression analysis to separate out the peaks for image analysis.
“First of all, we had to establish a gold-standard database of cases, each of which had to have independent diagnosis with immuno-phenotyping,” he said. This dataset includes cancer cells as well as healthy cells for samples. The PathMiner system also includes a distributed telemicroscopy system and an image-guided decision support system.
Foran said his group developed the algorithms for PathMiner “shoulder-to-shoulder” with pathologists, oncologists, and radiologists. “Almost everything we develop ultimately becomes very customized,” he said.
A new project that began last fall is leveraging his group’s experience with IBM’s World Community Grid to develop the algorithms to adapt PathMiner to an expanded library of expression signatures, and to integrate multi-spectral imaging, for example, to analyze tissue microarrays and mine protein expression profiles, with the ability to compensate for mechanical distortions in tissue samples [see BioInform 02-08-08].
Foran’s projects are intended as decision-support systems through which data analysis can be automated, compared via a cybernetwork, or compared with retrospective clinical data — all of which stands to help in detection and disease management, he said. “I think all of these technologies are going to start getting integrated with clinical work,” he said.