BIOPHYSICIST Bill Bruno thought that his field of study, computational molecular evolution, could use some better tools. Hed already come up with some himself for example, the Weighted Neighbor Joining method of evolutionary tree reconstruction now used by the Ribosomal Database Project. But he had ideas for a few more ways to improve DNA and protein sequence comparisons.
Bruno was particularly intrigued by the many DNA sequences that are conserved between mouse and human but that arent genes. I cant single-handedly figure all of those out, but I can provide tools so that when others look at them theyre seeing something real, not just a mistake in the software, he said. So Bruno took advantage of an entrepreneurial leave program offered by Los Alamos National Laboratory, where he was a principal investigator in the Theoretical Biology and Biophysics Group. In October he founded DNA Mining Informatics near Santa Fe, in a region of New Mexico dubbed the InfoMesa for its high-tech growth. At the moment, Bruno is the companys only employee, as well as its sole funding source. He has been engaging contractors to do his software engineering, but hopes to be able to hire a staff of four or five over the coming year. He is also meeting with potential investors. DNA Minings first product, Setter, is a front end for human/mouse nucleotide sequence comparison using Blast. There are two main things that Setter does for you, Bruno said. First, it sets the Blast parameters automatically. Second, we have a proprietary method of choosing the scoring matrix for the particular application and query. A key advantage of Setter, Bruno said, is the way it handles GC content in the genome. The variation in the GC content is 15 times what youd expect it to be, but the Blast statistics assume its random. In most versions, the default assumption is 50 percent GC content across the board. As a result, users can get false matches between two regions that both happen to have a high GC content, but that are not homologous. Some other tools attempt to correct for this problem by searching for and discarding false positives in the results set, but Setter deals with GC content variation at the query stage. It tries to find what you want to find in the first place, instead of throwing things away and possibly getting more false negatives, Bruno said. The overall result is more than twice as many true hits than with any Blast program alone, according to DNA Minings test results. Setter 1.1.0, the current version, was released in May. Its starting price, for a single workstation license, is $7,000. Bruno said he is in the process of finalizing his first order, with a large international pharmaceutical company planning to try out the software on a 16-processor server. In addition to pharmaceutical firms, Bruno sees genomics and biotechnology companies as potential customers. Anywhere on the drug pipeline, people want to compare human to mouse DNA, he said. Goals for 2002 include the release of Pro-Map, a program to detect protein sequence similarities. Pro-Map uses proprietary statistical techniques to pick up homologs more distant than can be detected by Psi-Blast, according to company literature. A prototype was successfully tested last year at the Critical Assessment of Techniques for Protein Structure Prediction competition (CASP4). SCC