Skip to main content
Premium Trial:

Request an Annual Quote

DNA Mining Wants to Improve Human/Mouse Sequence Comparison

Premium

BIOPHYSICIST Bill Bruno thought that his field of study, computational molecular evolution, could use some better tools. He’d already come up with some himself — for example, the Weighted Neighbor Joining method of evolutionary tree reconstruction now used by the Ribosomal Database Project. But he had ideas for a few more ways to improve DNA and protein sequence comparisons.

Bruno was particularly intrigued by the many DNA sequences that are conserved between mouse and human but that aren’t genes. “I can’t single-handedly figure all of those out, but I can provide tools so that when others look at them they’re seeing something real, not just a mistake in the software,” he said.

So Bruno took advantage of an entrepreneurial leave program offered by Los Alamos National Laboratory, where he was a principal investigator in the Theoretical Biology and Biophysics Group. In October he founded DNA Mining Informatics near Santa Fe, in a region of New Mexico dubbed the “InfoMesa” for its high-tech growth.

At the moment, Bruno is the company’s only employee, as well as its sole funding source. He has been engaging contractors to do his software engineering, but hopes to be able to hire a staff of four or five over the coming year. He is also meeting with potential investors.

DNA Mining’s first product, Setter, is a front end for human/mouse nucleotide sequence comparison using Blast. “There are two main things that Setter does for you,” Bruno said. “First, it sets the Blast parameters automatically. Second, we have a proprietary method of choosing the scoring matrix for the particular application and query.”

A key advantage of Setter, Bruno said, is the way it handles GC content in the genome. “The variation in the GC content is 15 times what you’d expect it to be, but the Blast statistics assume it’s random. In most versions, the default assumption is 50 percent GC content across the board.” As a result, users can get false matches between two regions that both happen to have a high GC content, but that are not homologous.

Some other tools attempt to correct for this problem by searching for and discarding false positives in the results set, but Setter deals with GC content variation at the query stage. “It tries to find what you want to find in the first place, instead of throwing things away and possibly getting more false negatives,” Bruno said. The overall result is more than twice as many true hits than with any Blast program alone, according to DNA Mining’s test results.

Setter 1.1.0, the current version, was released in May. Its starting price, for a single workstation license, is $7,000. Bruno said he is in the process of finalizing his first order, with a large international pharmaceutical company planning to try out the software on a 16-processor server.

In addition to pharmaceutical firms, Bruno sees genomics and biotechnology companies as potential customers. “Anywhere on the drug pipeline, people want to compare human to mouse DNA,” he said.

Goals for 2002 include the release of Pro-Map, a program to detect protein sequence similarities. Pro-Map uses proprietary statistical techniques to pick up homologs more distant than can be detected by Psi-Blast, according to company literature. A prototype was successfully tested last year at the Critical Assessment of Techniques for Protein Structure Prediction competition (CASP4).

— SCC

Filed under