Last week, IBM, the Institute for Systems Biology, and Lynx Therapeutics announced a research collaboration on gene expression in macrophage cells, jointly proclaiming the project an opportunity to “identify basic IT requirements for computational biology in systems biology research.”
Each of the three players will benefit from improved systems biology informatics, but Lynx just may stand to gain the most from the deal: The project will develop tools and database systems for the company’s flagship MPSS gene expression platform that could help the struggling company market its technology to a broader customer base.
While MPSS, or Massively Parallel Signature Sequencing, is generally well regarded as a high-throughput method for generating high-quality gene expression data, customers have been slow to adopt the approach, and Lynx has suffered the financial consequences. The company’s revenues have dwindled quarter after quarter, and at the end of May it had to transfer its common stock to the Nasdaq SmallCap market because its market capitalization had fallen below Nasdaq’s minimum requirement.
New informatics tools from the collaboration — to be developed by the computational biology team at IBM Research — could be a shot in the arm for Lynx as it plans to broaden its marketing approach for MPSS, which is currently offered on a service basis. An integrated informatics platform for MPSS should make the technology more attractive to potential customers: Current gene expression analysis tools, which are designed for microarray data, are of limited use when analyzing the data that Lynx’s technology generates.
General-purpose gene expression analysis methods will work with MPSS, “but most of them break when you start looking at the enormous level of data that we provide,” Lynx CEO Kevin Corcoran told BioInform. “What we hope to get out of this is the development of algorithms and the development of the proper database to take advantage of the data.”
Lynx has developed its own analysis tools for the platform, “but the added horsepower that we’ll get with IBM Research behind us is just going to make it that much better,” Corcoran said. According to Gustavo Stolovitzky, manager of the functional genomics and systems biology group at IBM Research, “This data will challenge us to be creative and develop new algorithms…The fact that people haven’t been using it too much makes it less of a known in terms of knowing how to analyze it.”
The benefits of Lynx’s approach are well worth the effort, according to Lee Hood, president and director of the ISB. MPSS “allows us to analyze down to the single-copy level the expression of genes, of messenger RNAs, that are absolutely invisible to DNA arrays,” he said. “The reason this is important is that a great deal of interesting biology occurs down at the level of lowly expressed messenger RNAs, particularly biology that relates to gene regulatory networks and signal transduction pathways.”
Not Like Microarrays
The informatics challenges of MPSS begin with the volume of data the technique generates. While a typical microarray experiment provides information on 10,000 genes, a single run of MPSS produces sequence data for around 1.5 million cDNA clones. “We expect more data than we will get with arrays, and that means that we will have to organize this data and create new knowledge from the data,” Stolovitzky said.
Representing the MPSS data in a database is also tricky. Unlike gene expression arrays, where each data point represents a gene in a one-to-one relationship, MPSS generates sequence fragments — “signatures” — that must be matched with genes, parts of genes, or even regulatory regions or other genomic features. “There is an additional layer of complexity,” Stolovitzky said, noting that the additional complexity — when handled correctly — “will allow us to find things we didn’t know existed in the original data.”
Stolovitzky said that a first step to building new analytical tools would be evaluating the reproducibility of MPSS. “Because it’s a very young technology, we have to start from the beginning,” he said. “We need to quantify how reproducible it is…That will allow us to detect signal beyond the noise level of the technology.”
Additionally, the goal of the immune system research project — to assess how macrophage cells change over time in response to infectious diseases, drugs, or other agents — will drive demand for new computational tools, according to IBM. ”The algorithms will have to extract genes that vary in time in response to the agents, and genes that respond to some agents and not to others,” Stolovitzky said.
Systems Biology Building Blocks
The macrophage transcriptome project is only the “first phase” in a long-term process to develop informatics tools for systems biology, Stolovitzky noted. But according to Hood, it’s an important first step. The effort is “driving the development of good databases that can capture not only RNA information, but other kinds of information, such as proteomics information and metabolomics information,” he said.
Stolovitzky agreed that protein and metabolite data would be necessary “to continue on the road of systems biology.” Eventually, he said, “We will put together all this data to create pathway maps, and then maybe simulate these pathway maps in silico, and then make predictions that will allow us to do new experiments to refute and accommodate the things that we had wrong in the first pass.”
IBM has pledged its support for several aspects of this long-term process in various ways — most recently with an equipment grant for the CyberCell E. coli simulation project at the University of Alberta (see story on p. 3). But the work with ISB and Lynx makes it clear that the company wants to do more than supply the hardware to support systems biology. “We want to be leaders in knowing how to do each of these steps,” said Stolovitsky. The immune system research project “is going to be a very valuable playground to challenge us with innovation in this field.”