Jarg of Waltham, Mass., is hoping its “shotgun” approach to knowledge extraction will become the standard for peer-to-peer communication in the genomic and bioinformatics markets.
The startup has developed a distributed database architecture that Michael Belanger, president of Jarg, described as “similar to the architecture used by Celera to decode the human genome. It basically takes the representations of knowledge that have been extracted and fragments those and then distributes those fragments in a highly parallel table. When someone queries the engine it looks at what the requirements are, fragments those, distributes those fragments out, does a pattern match on the table and brings back whatever’s got the heaviest fragment overlap.”
The company was recently granted US Patent 6,192,364, “Distributed Computer Database System and Method Employing Intelligent Agents,” which adds the use of intelligent agents to the underlying database architecture in order to search live information feeds as well as stored information.
Jarg’s core business will be licensing the Knowledge Engine, Belanger said. The company plans to market it as a “Jarg inside” technology platform that will sit behind a company’s firewall or underneath an industry portal.
Belanger said that Jarg, short for “jargon,” is focusing on domain-specific knowledge extraction that will enable peer groups to communicate with each other using the language specific to their professional domain.
“We examine what’s in the content of digital objects such as text and with the natural language of the trade jargon of the field we can extract what’s being expressed by the professional in that data and represent that in a highly accurate compact unique summary,” Belanger said. The summary is indexed with the contextual setting in which it was originally expressed so that a researcher can use professional jargon to construct a highly detailed request. Responses are prioritized by degree of contextual fit.
The biotech/pharmaceutical industry is the first peer group Jarg is working with. The company received a $100,000 SBIR grant in 1998 to demonstrate that the Jarg Knowledge Engine could run the National Library of Medicine’s Unified Medical Language System, an ontology of over two million biotechnology terms and definitions.
While the ontology is in the public domain, “the problem is there’s never been a database engine architecture that could ever run it,” said Belanger. “We’re converting that semi-useful ontology into something suddenly enormously helpful and valuable in terms of the drug discovery pipeline. You’re able to extract information a lot earlier in the drug discovery process that might give you clues as to which target you want to focus your resources on.”
Jarg raised $1 million in a first round of venture capital funding last year and is currently pursuing a second, $3 million, round. The company is involved in a number of alpha and beta projects with companies in the pharmaceutical, bioinformatics, and genomics markets. Belanger declined to disclose who these partners are.
Future plans include the indexing of all types of digital object content. “In the same index of knowledge you can have text and video and sound and medical images. You’ll get back a collage of information that’s very contextually specific to what you asked for,” said Belanger.
“This is what will replace pretty much all search engine technology in the future,” Belanger added.