Agilent Technologies’ central research laboratory has thrown its hat into the computational systems biology ring with the release of a text-mining software tool.
The software, called Agilent Literature Search, is available as a free plug-in for the Cytoscape network visualization package. Available through Agilent Labs’ website (http://www.labs.agilent.com/), the software was developed as part of a relatively new — and little publicized — systems biology project underway there.
Annette Adler, project manager for the systems biology effort at Agilent Labs, said that the project officially kicked off last spring, “but it builds on work both in informatics and in molecular measurements and proteomics that [has] been going on for a number of years prior to that.”
The goal of the systems biology project, Adler said, “is bringing these individual pieces of research together and working with external collaborators to test and extend them.”
The research project began just around the same time that Agilent’s commercial arm took steps to consolidate its various life science product lines under the systems biology rubric. Last March, Agilent merged its gene-expression, proteomics, and reagents businesses into a single unit that it dubbed Integrated Biology Solutions.
The relationship between Agilent Labs and the business side of the company is malleable. Mary Lou Simmermacher, a spokeswoman for Agilent Labs, said that there is “no direct connection” between the research project and the company’s commercial software tools, and said that the company is “not going to be actively marketing the literature search tool.”
Agilent “does have the right to commercialize the plug-in if we want to, but we have no immediate plans to do that,” Simmermacher said.
Nevertheless, Adler said, the research group does have some responsibility for filling in any technology gaps that may exist for customers using Agilent’s instruments for systems biology research.
“The goal of the project is to develop new technologies and new procedures and protocols as needed to do systems biology. So where there exists a product, we’re using it and testing it. Where there isn’t a product and we have a research prototype, we’re using and testing that.” However, she stressed, “it’s not product testing.”
Next (Wet) Bench
Adler said that the systems biology project is a natural offshoot of Agilent Labs’ historical emphasis on electronics R&D. “Agilent has a long-term tradition coming from its roots at Hewlett-Packard of what they call ‘next-bench,’” she said. “So if I was working on something, and I needed more help in understanding how to develop it, I could lean over to the next bench — quite literally — and ask the guy [whether] it met his requirements or not.”
As the company moved into the life sciences, Adler said, “What we’ve done at Agilent Labs is seek next-bench partners outside [the company] who are actually pursuing some biological question, and we’re working with them using what we have in hand, or inventing where we have to in order to help pursue those biological questions.”
In the systems biology project, Agilent Labs is working with researchers from Stanford University on autoimmune disorders, and with the Translational Genomics Research Institute on melanoma, autism, and pancreatic cancer. Both projects involve genomics, proteomics, and metabolomics studies, and Adler leads an effort at Agilent Labs dubbed “systems informatics” that is charged with building qualitative biological models by integrating data from these different platforms.
The literature search tool was an obvious starting place for the informatics development team, Adler said. “People needed … to take their molecular data — however they got at it, through genomics or proteomics — and they needed to put that into a biological context, and a big chunk of that biological context is looking at scientific literature,” she said. “So we built a series of prototypes internally, which we tested with our external collaborators over the years.”
In addition to the literature-search tool, the Agilent Labs systems biology team is developing an informatics architecture called ALFA that is based on an object model for representing information from disparate data sources in a structured format. Adler declined to provide details on how, or whether, the company plans to make ALFA available, but she said the company decided to release the text-mining software publicly through Cytoscape “because we also felt there was an increasing need for standards so that data would be easier to share — of any type,” Adler said.
Cytoscape — an open-source project led by investigators at the University of California San Diego, the Institute for Systems Biology, Memorial Sloan-Kettering Cancer Center, and the Pasteur Institute — is “an emerging standard” in computational systems biology, according to Adler. “When you look at efforts going on in this arena, they’re the one that gets talked about a lot,” she said. “We just felt that was the right group to work with.”
The core applications for Cytoscape (available at http://www.cytoscape.org/) are distributed under the LGPL. Plug-ins like Agilent Literature Search, however, are considered “separate works” that use Cytoscape as a Java code library and are “therefore governed by independent software licenses distributed with and specific to each plug-in,” according to the Cytoscape website.
Agilent has released its plug-in under its own company license. Oracle, the only other commercial plug-in developer so far, released a package that allows users of Oracle 10g to visualize and analyze network data stored in Oracle Spatial Network Data Model using Cytoscape. A full list of Cytoscape plug-ins is available at http://www.cytoscape.org/plugins2.php.
Agilent’s text-mining software allows users to upload hundreds of genes or proteins at a time, along with keywords that describe the kinds of relationships to search for. The software extracts information from databases like PubMed and the US Patent Office, and then represents the relationships between those objects using Cytoscape’s visualization capabilities.
According to Adler, the system is more high-throughput than other life science text-mining systems, and can query as many as 1,000 biological entities against more than 30,000 PubMed articles “in what amounts to a batch launch.”
The life science text-mining sector is heating up. The launch of Agilent’s software comes a week after Cognia announced that it had been granted a share of a £5.3 million ($10.2 million) grant from ITI Life Sciences, a Scottish economic-development agency, to develop text-mining technology for life science research [BioInform 03-21-05].
But Agilent’s decision to make its software freely available seems to run counter to ITI Life Sciences’ conclusion that a promising commercial market exists for text-mining technology for life science research. The organization estimates that the market for these tools could reach £200 million ($374 million) by 2014.
Perhaps this simply isn’t a large enough market for Agilent, which posted $7.2 billion in revenues for 2004. In addition, the benefits of helping researchers use data from its instruments may far outweigh any revenues from a still-emerging technology, and the release may help raise Agilent’s profile in the eyes of the academic community, which tends to embrace free software.
Whatever Agilent’s motivation for making its software freely available may be, providers of commercial packages in the sector don’t see the company’s presence in this market as an immediate threat.
“We’ll have to see, but we’re not frightened yet,” said Ilya Mazo, president of Ariadne Genomics, which sells text-mining software as well as pathway analysis data and software. “It could be a good thing, because it might help promote the concept of association maps,” he said. Once researchers are comfortable with the software’s capability, “they can decide whether they want a free tool, or something more complex,” he said.