Web services entered the scene a few years back, accompanied by the IT industry’s typical hype, but it appears that the approach may actually deliver on its promise for integrating disparate bioinformatics resources.
An informatics team at Bristol-Myers Squibb is using the approach to address some of the shortcomings of an increasingly popular method for integrating informatics applications: so-called workflow or pipelining applications, from vendors like SciTegic, InforSense, TurboWorx, and Incogen. These systems can effectively link multiple software applications into complex research workflows, but when it comes to tying them into a high-performance computing architecture, most of them fall short.
But the BMS team has found that web services may be just the thing to merge workflow and HPC. The BMS team is creating a multi-tiered architecture that allows bio- and cheminformatics applications to run together in complex pipelines on a compute cluster, while keeping the whole integration process invisible to end-users. “What we’re working on is actually larger in scope than just the advanced workflow pipelining tools that are now available,” says Nathan Siemers, director of R&D at BMS.
BMS has enlisted the help of several vendors to develop the system. SciTegic’s Pipeline Pilot is the workflow tool of choice, at the top of the stack. This sits on top of a web services layer based on the BioTeam’s iNquiry software, which itself is on top of Platform’s LSF cluster-management software.
Web services — XML-based standards like SOAP, WSDL, and UDDI — allow different applications to communicate with each other regardless of their operating system or programming language. “It’s a way to do distributed computing and remote computing, and it’s platform and language neutral, which means you can write Java, you can write Perl, you can use a pipelining tool, what have you, to solve your problems,” Siemers says.
— Bernadette Toner
US Patent 6,898,530. Method and apparatus for extracting attributes from sequence strings and biopolymer material. Inventors: Jeffrey Saffer, Augustin Calapristi, Nancy Miller, Randall Scarberry, Heidi Sofia, Lisa Stillwell, Guang Chen, Philip Monroe. Assignee: Battelle Memorial Institute. Issued: May 24, 2005.
Protects systems for creating high-dimensional vectors representing sequence strings and biopolymer materials. A first system divides respective sequence strings into blocks of at least three units to create a vocabulary of blocks; a second system selects predefined domains of many biopolymer materials; a third system defines each item of biopolymer material in a data set of biopolymer materials as a surface using descriptors of at least one of structure and function; and a fourth system compares information regarding each biopolymer material to information regarding other biopolymer materials.
Agilent Technologies and Rosetta Biosoftware plan to integrate Agilent’s GeneSpring desktop gene-expression analysis package with the Rosetta Resolver enterprise-scale expression analysis system.
NCI has signed institute-wide licenses with GeneGo and Genomatix for their pathway analysis platforms.
IBM and France’s Ecole Polytechnique Fédérale de Lausanne are collaborating on a two-year research project nicknamed “Blue Brain” that will put IBM’s Blue Gene supercomputer to work on building a detailed model of the circuitry in the human neocortex at the cellular level.
NIH has issued a request for applications for the development of a model organism database for E. coli strain K-12, related strains, and their phages and mobile genetic elements.
Geneva Bioinformatics signed an agreement with Hitachi Software Engineering to distribute its Phenyx mass spec analysis platform in Japan.
Agilent Technologies plans to acquire chromatographic informatics company Scientific Software for an undisclosed sum to marry Agilent’s analytical instrumentation, data systems, and services with SSI’s chromatographic data systems and informatics.
Biomax and Softberry will work together to integrate a wide array of Softberry’s genome analysis programs into Biomax’ Pedant-Pro sequence analysis suite and BioRS integration platform.
1,630,306,866 base pairs in 16,214 fragments
The Sanger Institute has released the fifth assembly of the zebrafish genome, Zv5. The assembly includes 1,630,306,866 base pairs in 16,214 fragments. A preliminary database of Zv5 is available at http://pre.ensembl.org/Danio_rerio.