Skip to main content
Premium Trial:

Request an Annual Quote

HNC Applies Pattern Recognition Toolkit to Bug ID Bacterial DNA Fingerprint Project

Premium

Neural net pioneer HNC Software has a lot to offer bioinformatics, according to Joseph Sirosh, executive director of advanced technology solutions.

“We have strong intellectual property in pattern recognition that can be applied to any kind of data, even genome sequence,” Sirosh told BioInform last week. “There is a lot of value we bring to the area of pattern recognition in all life sciences.”

To this end, the San Diego-based company recently signed a three-year contract with SPAWAR (Space and Naval Warfare) Systems Center, San Diego, for a DARPA-sponsored project to develop software to create bacterial DNA “fingerprints” — unique patterns within bacterial DNA codes — by analyzing both genomic databases and unstructured biomedical literature.

The “Bug ID” contract gives HNC the potential to earn up to $2.45 million over the course of the project, but also opens a wide window of opportunity for commercialization possibilities. In addition to the value of the developed software, the catalog of DNA fingerprints that will result from the project “will be very valuable and very unique intellectual property,” said Sirosh. “There are several possible business models and business opportunities here and we would like to explore all of them.”

One possible outcome, Sirosh said, is massively parallel diagnostic tools such as microarrays that could be used to screen for a hundred thousand different bacteria and pathogenic organisms in one step.

The Bug ID project is the first step in creating sensitive tests that are unique to each kind of bacterium. “If we can identify the fingerprints for 100,000 bacteria then we can create a DNA probe test that can be put on microarrays,” Sirosh said.

The Data Mining Duo

While a number of groups are developing techniques for analyzing DNA sequence data or for extracting information from the medical literature, Sirosh said that HNC is the first company to combine the two techniques in an automatic and integrated process.

The primary challenge of creating reliable DNA fingerprints based on sequence data is “picking something that is unique to that particular bacterium and doesn’t change all that much,” Sirosh said. Finding such a pattern, or a “maximally unique and minimally mutating probe,” requires extremely sophisticated pattern recognition techniques. HNC plans to use new developments in stochastic context-free grammars, hidden Markov models, and phylogenetically weighted HMMs to advance the current state of pattern recognition on the sequence analysis side of the project.

In addition, Sirosh said the company would add new entity recognition and information extraction algorithms to the text mining technology it has developed for other purposes — and which has already appeared in a number of commercial implementations, including a search tool on eBay. The text-mining technology will be used to analyze biomedical literature in order to automatically identify properties such as whether a particular strain is pathogenic or not. This information can then be fed back into the pattern recognition process to find, for example, the fingerprint that is unique for a disease-causing bacterium as opposed to a closely related strain that is not pathogenic.

“We’re not doing sequence analysis in isolation,” said Sirosh. “We are analyzing sequences and finding DNA fingerprints in the context of known pathogenicity and known disease-causing potential.”

DARPA Needs Diagnostics

While potential uses for the technology run the gamut from basic genomic sequence analysis through diagnostics, SPAWAR and DARPA are clearly interested in its defense-related applications. “If there is a biowarfare attack you could use a chip out in the field to diagnose what a person is infected with and screen for 100,000 different things at a time instead of doing 100,000 different tests,” said Sirosh.

The company has already discovered DNA probes for some bacteria and is trying to scale the process up to create large numbers of DNA fingerprints. Sirosh estimates this production phase will occur in about a year and a half. By the end of the three-year project, Sirosh expects to have a catalog of at least 10,000 bacteria and to have begun the process of transferring those DNA fingerprints to substrates such as microarrays for actual diagnosis.

“For that we will look for an industrial partner who can do this on a massive scale,” Sirosh said.

— BT

Filed under