The University of Washington has granted Geospiza a license to sell RepeatMasker, a popular sequencing analysis program that identifies and masks repeat sequences — or so called junk DNA.
Geospiza, a Seattle-based bioinformatics software maker, paid the University of Washington a one-time license fee for the program, which was invented by a university scientist. It also agreed to grant the university long-term royalties, according to Geospiza’s president, Todd Smith, although the parties did not disclose the specific financial terms of the agreement.
“It’s an exclusive license,” said Smith. “As new versions come out, we’ll be the only place to get them.”
RepeatMasker, originally developed in 1996, is already being used by 650 academic institutions and about 100 companies according to its inventor, Arian Smit, now of the Institute for Systems Biology.
“It is one of those standard programs,” said Richard Mural, a bioinformaticist at Celera Genomics. “We use it here.”
Geospiza will market the RepeatMasker both as a stand-alone product and as part of Finch-Suite, a “one-stop shopping” DNA sequencing database management system. Finch-Suite incorporates Geospiza’s versions of Phred and Phrap sequence assembly programs.
The product will be priced at $4,000: $2,500 for the core program and $1,500 for a complementary program, Crossmatch, which conducts the actual comparisons between unknown sequences and repeated sequences.
Smith said he expects additional revenues of $100,000 to $200,000 per year from sales of RepeatMasker.
RepeatMasker works by comparing a given sequence to a database of known “repeats,” meaningless but common repeated sequences that exist in the genome. If the DNA introduced matches one of these repeats, the sequence is then masked out. This eliminates the genetic “background noise” and narrows down the area within which researchers can look for meaningful genetic information.
The system conducts comparisons using RepBase, a database of eukaryotic repeat sequences compiled by the private nonprofit Genetics Information Research Institute in Sunnyvale, Calif. GIRI granted Geospiza the distribution rights to this database in September.
Since these databases — especially some of the mammalian ones — are still works in progress, RepeatMasker cannot identify all repeats, Smit acknowledged. But Smit and others are still working on improving the program.
“Under the most conservative estimate, the program cleans up over 90 percent of repeats,” said Smit. As human, mammalian, Drosophila and other repeat databases continue to be refined, the system will become even more effective, he said. Smit has recently added features to detect human and mouse DNA contamination in sequences. The latest version of RepeatMasker can also identify and mask insertion sequence elements and other cloning artifacts.
RepeatMasker has several competitors.
At least two other companies, Compugen and Paracel, which was recently purchased by Celera Genomics, also have forms of repeat masking systems in their product portfolios. Paracel’s GeneMatcher bioinformatics unit performs sequence masking of low complexity or repetitive sequences among scores of other algorithms, but not all labs need or can afford this high-throughput sequencing analysis unit.
Some large companies developed their own bioinformatic methods for eliminating meaningless repeats, and GIRI offers Censor Server for masking. But Censor is “too slow,” Smit said. “I don’t think it has been developed much in the past three or four years.”
Instead, Geospiza hopes they will opt for new updates of RepeatMasker, which Geospiza plans to offer as Smit refines the program. The company plans to expand it to add features for comparative analysis of genomes.
Smith is confident that the demand for the program will remain consistent, even as the growth of sequencing efforts slows down in the coming years. “There are a million algorithms for looking at DNA,” said Smith. “But this is an essential one.”
—Marian Moser Jones