While the business arm of IBM may have waited until October 2000 to formally launch a life sciences unit, the research branch of the company has been quietly building a comprehensive bioinformatics toolset since 1996.
Isadore Rigoutsos, who manages the bioinformatics and pattern discovery group at IBM’s computational biology center, told BioInform that the team recently released a new version of its web engines and tools.
Eschewing press releases and product announcements in favor of a word-of-mouth approach, the five-member group working out of the Watson Research Center has nonetheless been able to attract a number of users for its Teiresias pattern discovery algorithm and Bio-Dictionary collection of amino acid patterns. Rigoutsos said usage has increased by a factor of a hundred since a new user-friendly graphical user interface went online in March.
“People know about it mostly through our publications or because they search Google for pattern discovery,” Rigoutsos said. “We took that approach because there’s only five of us that have to do the research as well as the web design and maintenance, so we figured we’d let word of mouth spread and increase the number of users while we’re debugging it.”
Rigoutsos did not disclose the total number of users for IBM’s tools.
Teiresias is a two-phase combinatorial algorithm for general-purpose pattern discovery, but its speed and ability to handle very large input datasets and arbitrarily large alphabets have made it applicable to a number of computational biology applications, including DNA tandem repeat discovery, automated protein functional and structural annotation, and gene discovery.
The Teiresias engine currently supports nine discovery, annotation, and analysis options. New options for DNA tandem repeat discovery, gene identification, and irredundant motif discovery are in the works.
The IBM team also released an updated version of its Bio-Dictionary tool — a collection of repeating protein sequence patterns that act as “words.” The first version of the Bio-Dictionary was released in 1999, using public databases such as SwissProt and GenPept as input. Rigoutsos said the newest release uses SwissProt/TrEMBL data from June 2000 and the team is completing a new computation using data from May 2001.
The updated Bio-Dictionary was used to build several complete annotated genomes, which were released with the new interface in March. Annotations for two eukaryotic genomes, three archaeal genomes, and seven bacterial genomes are currently available.
Rigoutsos said more tools would be phased in as they are developed. He is aiming for a September release of some significant new features, including an interactive website that will allow users to interactively process genome annotations using natural language text commands.
Teresias and its associated tools are freely available for non-profit users at http://cbcsrv.watson.ibm.com/ Tspd.html. Licenses are available for commercial use.