Tucked away in GE’s Niskayuna, NY, Global Research Center, a team of twelve researchers has been quietly building a suite of bioinformatics tools to support the company’s medical imaging and medical diagnostics R&D. Now, according to Brion Sarachan, bioinformatics lab manager, the team is ready to begin sharing its work with the rest of the scientific community.
“I don’t think people know that GE is doing anything with bioinformatics, but in fact we have a group of very excited people, and a growing group,” Sarachan told BioInform. The team, which has spent the past two and a half years developing a number of software tools for predictive medicine, pathway analysis, biomedical text mining, and cellular simulation, is now ready to step out from behind the scenes. A paper on the group’s text mining software will soon appear in Bioinformatics, Sarachan said, and the team plans on making most of its tools available to the broader research community.
“Our general approach to the bioinformatics tools we develop is that, at some point, we intend on sharing them,” Sarachan said. “GE doesn’t have business plans for the tools. They’re really for internal use.” The company hasn’t yet determined when — or in what form — it will make its software available, but Sarachan estimated that some of it should be available within the next year. “It’s not our highest business priority, but we see no reason not to make the tools and the public data available,” he said.
GE’s bioinformatics team is part of the company’s molecular imaging research group of around 50 scientists. The company, long a leader in the medical imaging and diagnostics business, has targeted molecular imaging as a strategic growth area over the next few years, and is focusing its R&D efforts in the areas of cancer, cardiovascular disease, and Alzheimer’s disease.
The concept behind diagnostic molecular imaging — a compound is injected into the body, binds to a molecular target, and is detected by a PET or MRI scan in order to serve as a pre-symptomatic indicator of a disease — requires a lot of the same data used in drug discovery, which is what spurred the formation of the bioinformatics group. Just as in drug discovery, “You have to be able to research biological pathways, molecular targets, and discover what targets are uniquely associated with a disease,” Sarachan said.
In addition to early disease detection, GE is looking at molecular imaging as an effective method for tracking the efficacy of new drugs. “That might mean that the imaging target is not the same as the drug target — it might be downstream on the same pathway,” Sarachan said. The outcome of this research — besides improved molecular imaging tools — is a comprehensive suite of software tools specifically focused on pathways, networks, and other sub-cellular mechanisms.
On the Right Path
GE has built an informatics platform that gathers and correlates biological data from multiple sources into a single repository of annotated protein-protein interactions. The bioinformatics team wrote a set of parsers to import and integrate data from publicly available pathway databases like BIND, TransPath, KEGG, and others. Using that as a baseline data set, the team developed a set of natural language processing algorithms to extract additional interactions from PubMed abstracts and add them to the repository.
GE is using the pathway reconstruction tool to study disease-specific pathways of interest, but Sarachan said the software is broadly applicable to any type of pathway analysis. GE began using it a year ago for a cardiovascular disease study, and “it was able to very quickly reconstruct the relevant pathways and even bring in some protein interactions that weren’t obvious from a manual library or literature search,” Sarachan said.
The GE bioinformatics team is just about to apply the tool to a new study on cancer, and has also begun experimenting with the Affymetrix microarray platform in order to annotate pathways with gene expression data. Sarachan said the team is also working toward adding protein expression data, SNP data, and several other types of biological information to pathways of interest.
The GE bioinformatics group is also heavily engaged in cellular modeling, and has built what Sarachan describes as a “simulation engine” to build models of viruses and bacteria, as well as human cells.
The simulation tool is unique, Sarachan said, because it simulates two kinds of cellular phenomena at the same time: continuous events and discrete events. “A continuous phenomenon might be the amount of some protein in a cell, but then whether or not that causes a gene to be transcribed would be more of a discrete kind of event — it’s binary, on or off,” Sarachan explained. Most cellular simulation projects, such as E-Cell, use differential equations to model continuous phenomena and stop there. “When we looked at specific examples of things we would want to simulate, we also saw that it’s important to think of discrete events,” Sarachan said, “so we built this engine that can do both together.”
The simulator uses the SBML (systems biology markup language) format, and Sarachan said that members of the GE team are working to add extensions to SBML that will improve the format’s ability to represent discrete phenomena.
The GE team validated the simulation engine on well-documented microbial cells, such as the lambda phage, and has now made the leap to human cells with a simulation of Alzheimer’s disease pathology.
Like its pathway informatics tool and natural language processing algorithms, “The simulation engine is something else we wouldn’t need to keep propriety. We’d be happy to share it with the research community,” Sarachan said.
In a manner consistent with the GE bioinformatics group’s low-key approach so far, Sarachan hesitates to use the “systems biology” buzzword to describe its research — although the sharp focus on pathway analysis and cellular modeling would fall within even the strictest definitions of the term.
“Perhaps the reason that we don’t focus on thinking of it as systems biology is that we have a very disease-oriented focus,” Sarachan said. “Our goal isn’t to simulate everything in a cell the way the E-Cell project is going… we have a very specific focus where we have a deep study of certain disease pathways.” However, he added, “If you define systems biology as not looking at one molecular target, but at the whole pathway — the whole range of what the possible targets might be — and thinking of that whole pathway as a system, then certainly we’re trying to take a systems approach.”
The bioinformatics activities in support of GE’s molecular imaging research are also very closely tied to medical informatics tasks, such as the study of large sets of patient data to stratify populations into risk categories or even to predict the rate of disease progression. “As we go forward, I think bioinformatics and medical informatics are going to come closer and closer together, and already there’s a lot of crossover,” Sarachan said. A particular area of convergence is in microarrays, he noted, “because they are becoming increasingly important for medical informatics.”
From GE’s perspective, the marriage of these disciplines will not only support development of its molecular imaging technology, but will feed into the longer-term goal of personalized medicine, where its diagnostics tools are expected to play a major role. The company certainly sees a role for the informatics group in the future of its R&D activities. “We’re actively recruiting,” Sarachan was happy to report.