While the informatics community may differ over the definition of systems biology, participants at a recent conference were able to agree on the first step in making it work: data integration.
On December 4th and 5th, 16 speakers at the Marcus Evans Bioinformatics in Systems Biology conference gathered in a snowy New York to offer up their thoughts on key bioinformatics issues in the emerging realm of systems biology. They characterized the field as including every possible combination of microarray analysis, proteomics, structural biology, pharmacogenomics, metabolomics, and the seemingly endless list of other “omics” technologies, but a recurring refrain resonating through this patchwork was that this wealth of information won’t lead to a single breakthrough discovery until it can be analyzed in concert.
At the conference, a number of speakers from pharma and academia were happy to share their tactics for massaging data from a multitude of high-throughput platforms into some kind of workable form. A sampling of some of these approaches follows.
The Old New Thing
John Hill, executive director of drug discovery and exploratory development informatics at Bristol-Myers Squibb, opened his talk by noting that “systems biology isn’t anything new. It’s the same questions [of classic biology] with new tools and mindsets.” At BMS, he said, the challenge for systems biology is reconciling target-related data with data about patients and compounds. Coining yet another “ome” — the “pharmacome” of all known chemical compounds — Hill said the company’s mission was to provide all of its researchers quick access to all available data from across the company in all three of those areas.
BMS is in the middle of implementing its company-wide SMART-IDEA integration system [BioInform 11-04-02], which is expected to alleviate many of these issues. However, Hill said, integrating information from animal studies with more computer-friendly data from other research divisions remains a challenge.
“A big part of pharma is still about sticking a white powder in an animal and seeing if it gets better or not,” he said. And this text-based information is not likely to mesh easily with numerical data from lab instruments. However, BMS will be working on connecting this information “over the coming years,” Hill said. The current system is only a standardized architecture that serves as “the groundwork” upon which the company can add new components going forward.
SMART-IDEA currently handles pre-clinical and biological data, but plans are in the works to add a pharmacogenomics repository and a clinical repository next.
While it’s still too early to gauge the results of the integration process so far, “there is anecdotal evidence that people are making better decisions because they are spending more time analyzing data than they are gathering it,” said Hill.
Better Biology through Systems Engineering
Larry Arnstein, an assistant professor in the department of computer science and engineering at the University of Washington, offered an engineering-based perspective on systems biology.
Systems engineering, he said, “is understanding the dynamic properties of systems in order to design desirable products.” Computation is a key method of analyzing the dynamic properties of systems but, according to Arnstein, biology has failed to keep up with engineering, “which has figured out how to use computing in every step of the scientific process.”
Translating a systems engineering approach to biology requires more than just cool modeling software (which Arnstein’s lab does happen to offer in the form of a package called J-Sim). More important, he noted, is capturing data at every step of the research process so that it can be integrated across platforms as it is gathered.
Arnstein and his colleagues created a system called LabScape that acts as an electronic “ubiquitous lab assistant” and captures this experimental data so that it can be fed back into the modeling process. Biologists at the University of Washington, along with the university’s Cell Systems Initiative, are currently using the system.
A free version of LabScape is available at labscape.cs.washington.edu. Arnstein is also the co-founder of a company called Teranode that plans to offer a commercial version of the technology.
Clueless, but Coherent
Roland Stoughton, senior vice president of informatics at Merck/Rosetta Inpharmatics, admitted that bioinformatics is “still at a relative level of cluelessness” regarding systems biology, but there is hope in starting with what is known. He explained how the company’s strong foundation in microarray analysis is helping it merge genotype information with expression and phenotype data. “We can’t put together a whole system yet, but we can find biomarkers,” he added.
Later this month, the company will publish in the New England Journal of Medicine a study it conducted with the Netherlands Cancer Institute on breast cancer biomarkers. Stoughton said the researchers were able to identify 70 reporters out of 25,000 RNAs studied that predict metastasis in breast cancer.
While the “black box” approach of classifying expression data using neural nets, Bayesian systems, and the like has its benefits, Stoughton cautioned against relying solely on statistics to determine the significance of experimental data. “The more you know about the system, the better,” he said. For the breast cancer study, the researchers used a simple one-dimensional classifier that distinguished between a good prognosis and a poor prognosis.
Another pet peeve of Stoughton’s is the failure of bioinformaticists to distinguish between coherent and non-coherent data integration. Non-coherent integration, he explained, occurs when the results of two separate experiments — using different samples, platforms, and study conditions — must be correlated as a final step. Stoughton said a much better approach is to plan the experiments in advance so that they are run in parallel with a common set of tools and samples.
Looking forward, Stoughton projected that systems biology will have a large impact on medicine, with data on genotypes, mRNA, proteins, and metabolites combining to provide reporter sets “that will be patented and approved for certain conditions.”
Eventually, he said, medicine will be “treating pathways, not diseases,” a mindshift that will impact the entire pharmaceutical industry.