Just a few months after it received a $363,000 grant from the National Institutes of Health to create one of six national Exploratory Centers for Cheminformatics Research, the Indiana University School of Informatics has secured funding from Microsoft for another cheminformatics project.
IU said last week that it had received $49,000 under the Microsoft Smart Clients for eScience program to develop a prototype of an integrated cheminformatics system built on Microsoft's .Net web services framework.
In late September, the National Human Genome Research Institute awarded IU and five other groups more than $2 million to create a network of Exploratory Centers for Cheminformatics Research as part of the NIH Roadmap's Molecular Libraries initiative (see below).
The goal of the centers is to develop software and other informatics resources to store, analyze, and manage data from the NIH chemical genomics screening centers. The IU center, called the Chemical Informatics and Cyberinfrastructure Collaboratory, has a particular focus on "distributed services architecture" and grid technology, according to its NIH grant abstract.
The project is "taking things one step further" than the NIH-funded project by "basically using networks of web services to fulfill use cases of much more computationally complex tasks that people might want to do, but making it very easy for scientists to do."
David Wild, assistant professor of informatics at IU and principal investigator on the Microsoft grant, said that his project is "taking things one step further" than the NIH-funded project by "basically using networks of web services to fulfill use cases of much more computationally complex tasks that people might want to do, but making it very easy for scientists to do."
The project, which is combining web services with intelligent agent technology, is expected to automate and simplify data-mining for pharmaceutical researchers. "The idea is to be a bit like Google and put a lot of smarts beneath the surface, but have a very straightforward interface," Wild said.
As an example, Wild said that a pharma researcher might want to know about any new compounds that might be potential binders to a protein, "but there is so much of this information around now, that nobody could hope to find it by trawling through all the literature and all the databases every day."
In that case, Wild said, the envisioned framework would "search the public databases and pull out new structures that have been added in the last few days, and then would feed those structures to a molecular docking program to dock them to the protein of interest, do some scoring, figure out whether any of them got past the threshold score, and then if so, e-mail the scientist."
Ideally, he said, "the scientist would just e-mail the server and say, 'Tell me all the structures that might fit this protein for the next three months,' and then forget about it, and the system would e-mail him back."
One "issue" that Wild noted in the development of the framework is the proprietary nature of many cheminformatics resources, which requires an extra authentication layer to ensure that users are licensed. However, he said that he's identified a "shift" in the cheminformatics market "so that there is more open source software being developed, and more small companies who are donating their software to academia."
Wild cited OpenEye and Digital Chemistry as two firms who have made some of their software freely available as part of the project.
NCBI has also partnered with OpenEye to use its OEChem Toolkit as part of the underlying technology for the PubChem database. Matthew Stahl, senior vice president and head of strategic development at the company, said that the PubChem partnership "has been very positive for OpenEye," and said that the company is "involved with a number of the NIH Exploratory Centers for Cheminformatics Research." He added that the company has a "longstanding policy" of providing its software at no charge to academic research groups.
Wild estimated that "about half the stuff that we produce will be based on software that is either open source or freely available, and the other half we'll figure out when we need to."
Bradley Ozenberger, NHGRI program director for the Exploratory Centers for Cheminformatics Research, agreed that "a lot of what's done in private industry, in the drug discovery industry -- those cheminformatics tools and computational chemistry tools aren't available to the academic research community," which was one of the primary reasons that NIH decided to fund the initiative.
"There is a risk that this program may actually duplicate some work that's already been done, because we will make our tools freely available to the community," he said, but on the other hand, he noted, "A company may be able to take some of these tools and add features and pull things together and provide an important product in the end."
Wild said that he has also seen interest from pharmaceutical firms, and that while the project is not "formally sponsored" by any particular pharma, "we're working quite closely with Eli Lilly."
Wild said that the IU team is "in the process" of building the web services framework, and expects to start work in February or March on the interface and the intelligent agents that will relay requests across the framework. Farther down the road, he said that he'd like to add natural-language processing capabilities to the system "that will be able to dissect from somebody's e-mail what they want it to do."
By next summer, "we hope to have a prototype available on the web that people can download and try out for themselves," he said. This version "will be basically client software that can interface with our web services and other web services too, with some use cases built in."
The "finished product" is targeted for release in around two years, Wild said.
-- Bernadette Toner ([email protected])
NHGRI's Exploratory Centers for Cheminformatics Research
NIH issued a request for applications for its Exploratory Centers for Cheminformatics Research just over a year ago [BioInform 12-27-04], and awarded six grants under the program in September.
Bradley Ozenberger, NHGRI program director for the ECCR program, said the initiative is an important component of the NIH roadmap's molecular libraries initiative.
"The NIH roadmap molecular libraries initiative will be putting all of [its] assay data, all the compound structures, all its information, into PubChem, and creating this unprecedented database of small-molecule screening data, and there's going to be tremendous value to be extracted from this sort of data," he said.
"So what we recognized here at NIH was that the research community isn't prepared to tackle such data, and that we need to support cheminformatics tool development to bring those sorts of tools and capabilities to the bigger community to really extract value from PubChem."
While last year's RFA indicated that up to 10 grants might be awarded, Ozenberger said that NHGRI awarded only six because it didn't get as many applications as it had expected.
"I think it's just a smaller community than we had anticipated," he said
The ECCRs make up the first half of a two-phase cheminformatics program at NHGRI. Next year, NHGRI will issue another solicitation for the second phase of the program, which will fund between four and six Cheminformatics Research Centers.
"Those will be larger grants, and those centers will be expected to really set the standard for cheminformatics research in the academic community," Ozenberger said.
|Curtis Breneman||Rensselaer Polytechnic Institute||Establishment of the RPI Exploratory Center for Cheminformatics||
|Paul Clemons||Massachusetts Institute Of Technology||General Data-analysis Tools to Relate Chemical Diversity to Biological Outcomes||
|Geoffrey Fox||Indiana University Bloomington||Chemical Informatics Cyberinfrastructure||
|Jacqueline Hughes-Oliver||North Carolina State University Raleigh||Comparative and Web-Enabled Virtual Screening||
|Kerby Shedden||University Of Michigan at Ann Arbor||MACE -- Michigan Alliance for Cheminformatic Exploration||
|Alexander Tropsha||University Of North Carolina Chapel Hill||Carolina Exploratory Center for Cheminformatics Research||