Informatics startup Code-N has launched an early-access program that will allow pharmaceutical companies to test-drive Green Field Finder, its cloud-based semantic software that helps researchers search the patent databases to assess the competitive opportunity for drug candidates.
The early-access program will likely last for two to three months, after which Code-N plans a full launch of its flagship product, Randy Haldeman, the company's CEO, told BioInform this week.
As part of the program, a "handful" of pharmas will get immediate access to GFF and will have the opportunity to test out it on drug leads, he said. Participants will also receive special pricing and support, along with input and first access to new versions of the software, and will have the option to participate in case studies with Code-N.
GFF's semantic approach to patent searching — based on so-called "concept clouds" that provide quick access to related information — is a new one for the pharma industry, according to Haldeman, and as such, Code-N is looking for companies that are willing to take the time to try something different as well as provide feedback to help optimize the product.
"There are other tools out there that help you search [for] a patent," such as those provided by companies like Thomson Reuters, Elsevier MDL, and Linguamatics, or resources like Google Patents, "but we do it a totally different way and we are the first ones to do it this way," he said.
The company's technology is based on the concept web, a semantic approach that assigns the same universal identifier to all synonymous terms that fall within specific concepts so that any search accounts for the broader concept as well as its synonyms.
Currently, the GFF system lets users check whether promising drug targets have already been patented or are being studied by competing pharmas.
According to Code-N, the concept web approach generates faster and more complete results than keyword-based search systems like Google Patents or Thomson Reuters' Cortellis, a web service that provides access to drug research and development content such as competitive intelligence; patent information; pharmacology-based data; and systems biology and disease data (BI 3/2/2012).
That's because those other offerings rely on "linear searches to discover drug-gene-disease combinations," Haldeman said in a statement. The trouble, however, is that "most of the easy targets have been found" and so these simple searches "don’t work anymore," he said.
For example, a Google search for patents that involve aspirin and hypertension will return different results depending on which keywords are used, he explained to BioInform.
Furthermore, using keyword-based searching, a researcher would need to know all the possible synonyms that exist for aspirin and hypertension, because a competing pharma might use alternate or uncommon terms in its patent filings in order to keep its drug development efforts hidden, he said.
This poses a problem for pharma companies, where researchers and chemists "perform complex searches that take days to complete, whether trying to find freedom-to-operate or traversing dozens of databases," Chris Waller, director of cheminformatics at Merck and a Code-N advisor, said in a statement.
The concept web technology that underlies GFF is an extension of the semantic web that, according to Marketta Silvera, Code-N's founder and executive chairman, had its start in European academic circles but is now gaining traction in American institutions as well.
The "semantic web does a super job connecting the datasets but interpreting the information and drawing intelligence out of it is getting tougher," she explained to BioInform.
Concept web technology, on the other hand, "actually interprets the intelligence beyond just connecting [data]" and lets users get at that information by running a single query, she said. "From the business point of view, for executives of [pharma] companies, it offers technology for rapidly determining the potential intellectual property value of a new idea," she added.
The GFF software connects users to concept clouds, which contain semantic triples that are enriched with provenance, context, and community information. In this system, each concept is assigned a universal identifier that is also applied to all synonymous terms. These terms are then organized into concept clouds and each concept is matched back to all its synonyms when a search is run.
Code-N has built a "metathesaurus" of key terms from resources like the National Library of Medicine's Medical Subject Headings and Unified Medical Language System databases; and patent information from the United States Patent and Trademark Office's patent database — all of which are housed in Amazon's cloud infrastructure, Haldeman told BioInform.
Using the concept-based approach, GFF's developers have matched words from their metathesaurus with relevant words in patents, so that "when someone types in 'breast cancer' … we grab every synonym and run that through the search and run all the [possible] combinations … within seconds," he said.
So, using the aspirin and hypertension example, when a user types in those two keywords into GFF, "we actually grab all these synonyms for each of the terms being searched and apply them immediately to what we call the concept cloud of the patents," he explained.
When Code-N launches GFF commercially later this quarter, customers will be charged a yearly subscription fee to access particular GFF concept clouds. Exact prices for each database are still being determined, Haldeman said.
Moving forward, Code-N plans to incorporate additional databases that provide more than patent information.
For example, it plans to add the Kyoto Encyclopedia of Genes and Genomes, portions of PubMed, and concept clouds that contain terms associated with specific diseases, genes, and proteins, Haldeman said.
For an additional fee, the company can also develop private concept clouds using customers' internal databases.
"We’ll basically index their database and keep it behind their firewall and we … link [the] nodes of their concept cloud out with the master … concept cloud," allowing them to "draw inferences and do reasoning across" both resources, Haldeman said.
The yet-to-be-determined fee will cover the cost of indexing and connecting to the private concept cloud, he added.
The company also plans to launch additional products for pharma that will focus on drug surveillance and adverse effects. Haldeman said that the company expects to begin developing those tools in the middle of 2013.