WASHINGTON, DC – The National Institutes of Health’s Office of Technology Transfer and information technology firm Discovery Logic are co-developing a next-generation version of text- and data-mining software to help intellectual property professionals navigate databases of interest, representatives from both organizations said last week.
Discovery Logic is developing the tool, called Catapult, which will add new data visualization capabilities to an existing software package called Synapse that the company developed for NIH.
The company is developing Catapult under contract from the NIH, and expects to release it by the end of September. In the meantime, NIH and Discovery Logic are seeking to create a public-private partnership to further develop the software and identify applications that are expected to interest the broader tech-transfer community, an NIH official said.
Bonny Harbinger, deputy director of the NIH OTT, was demonstrating Synapse at the Association of University Technology Managers’ Eastern Region meeting, held here last week. She said that the NIH OTT began developing the software last year as a way to help it sort through its extensive IP portfolio and streamline and speed up technology commercialization.
The NIH, which does the patenting and licensing for its intramural program and for the US Food and Drug Administration “has thousands of technologies that we need to try and move to further R&D and hopefully commercialization,” Harbinger told BTW.
“In order to do that, we needed something that would help us understand what we have in our portfolio across the 20-some licensing specialists and across different kinds of portfolios,” she added. “We decided that we would try to build a text-mining tool that would assist us with that, because it’s impossible for one person to keep all this information in their heads.”
Harbinger said that the NIH started fleshing out what types of capabilities it needed in the software and began developing an early version. To commercialize the technology on a larger scale, it contracted longtime commercial partner Discovery Partners of Rockville, Md., to develop the tool.
“We’ve been doing work for the NIH for a long time,” Mike Pollard, senior vice president at Discovery Logic, told BTW. “We saw that these databases are important to one another, and that a lot of the answers that the NIH and other organizations in the field are looking for are contained in multiple databases, not just one.”
Harbinger said Synapse has so far proven very successful in marketing NIH-developed technologies.
Prior to using the software, the NIH was limited to a “technology push” approach, Harbinger said, because there was no way for the agency to aggregate its relevant IP to fill a specific need for a potential licensee.
With Synapse, however, “we’re now able to do market pull, so we invite companies in to meet with us or we do a webcast, all the way from the largest pharmaceutical companies to smaller biotechs or VCs,” Harbinger said. “Then we can say, ‘Show us what you’re looking for, and we’ll see how we can match it up,’ and we can do it in seconds for them — kind of fill their shopping cart when they come in.”
“I guess it was serendipity that we found a business problem at the [NIH] OTT that was perfectly suited to this,” Discovery Logic’s Pollard said. “This turned basically from a research project that we knew was important but didn’t know who we were doing it for into solving an actual business problem, in about three months.”
Mine and Yours
At its core, Synapse is a text-mining tool that draws upon several databases of interest to intellectual property managers. These include TechTracs, a database containing the NIH and FDA intramural research portfolios; the US Patent and Trademark Office’s patent applications and issued patents; CRISP (Computer Retrieval of Information on Scientific Projects), a federal internal database of millions of records of federally funded biomedical research; Radius, a research grants and contracts database maintained by the nonprofit company RAND; and the Medline scientific literature database.
Synapse also has access to a collection of biomedical-related news stories, and a database of rare diseases and conditions maintained by the NIH Office of Rare Diseases.
Pollard said that the text-mining methodology sets the software apart from traditional search engines in that “it starts with the full text of the technology, and then it relates the full text of anything else that somebody is looking for. With Google you are expected to type in four or five words, and then you get a massive amount of information that is not necessarily related. Here, you just double click, and you get extremely related information about patents, journal articles, whatever, so it becomes much more relevant much more quickly.”
“This turned basically from a research project that we knew was important but didn’t know who we were doing it for; into solving an actual business problem, in about three months.”
As NIH and Discovery Logic have demonstrated Synapse to various interested parties over the past several months, a number of additional useful applications have begun to crystallize, Harbinger and Pollard said. These include identifying experts in a particular field, finding out what universities or companies are working in a specific area, or gathering information about the amount of federal funding that has been awarded to various research projects, and the agencies that administered the grants.
“As we added databases and showed this to people, they said, ‘Oh, I can use it for this or that,’” Harbinger said. “It has kind of endless uses depending on what question the person is looking to answer. As we add databases, the information you can mine from it becomes more and more robust.”
The volume of information, however, has reached the point where a simple spreadsheet presentation of the data is a bit daunting for the average user — hence the development of the second-generation tool, which will be called Catapult.
This version of the software will incorporate data-visualization techniques so that information once presented in a linear fashion will be available in the form of a graph, chart, or map – much like Spotfire’s DecisionSite software is used by the biotechnology and pharmaceutical research community to visualize the massive amount of information produced by screening experiments.
“The next step would be synthesis of the data, so you understand meanings within it, and then you visualize it so it doesn’t come out in just a static analytic manner,” Harbinger said. “Instead of just saying some things are related, we can show the user how it is related, and why it is related, and show things that may be fairly attenuated in their relationship, but might be interesting to somebody. It helps us bundle technologies, [and] find gaps and synergies in specific areas and across portfolios.”
Pollard said that Discovery Logic hopes to have a finished product on the market in September. But in the meantime, Harbinger said that the NIH is going to issue a request for information with the intent of creating a broad public-private partnership to further develop Catapult and similar tools
This would serve to “both make it more robust and bring in new ideas from across industry, non-profits, et cetera, to see what this can do, so it’s not just limited to what I imagine it can do,” Harbinger said. She also said that the RFI is intended “to bring in additional tool makers” to see what they can add to the software package.
But Pollard said that NIH still “holds the torch,” so to speak, when it comes to further developing the software. “The tool was developed for [Harbinger’s] office in particular,” he said. “You could almost say it’s a perfect tech transfer. They are driving requirements, and other people are adding requirements as we go.”
Right now, the NIH is the largest customer of Synapse, although Pollard said that several undisclosed pharmaceutical companies, university tech-transfer offices, and others have either licensed the software or expressed interest in doing so.
Pollard said that Discovery Logic sells subscriptions for $3,000 per year for commercial or academic users, and $2,500 per year for government users. Discovery Logic did not disclose potential pricing for Catapult.