Biosof and IBM Collaborate on Proteomics Hardware/Software Platform
Biosof, a Columbia University spin-out devoted to drug discovery, agricultural science, and homeland security, is collaborating with IBM to provide a server platform for PredictProtein, the brainchild of Columbia University researcher Burkhard Rost.
IBM and Columbia are also exploring the option of porting PredictProtein to IBM’s Cell Broadband Engine.
The main goal of the project is to allow users to run PredictProtein jobs on IBM high-performance computing hardware via a virtual private network.
IBM’s Janis Landry-Lane, program manager in the company’s deep computing group, told BioInform that the university had set up a web portal for users to use PredictProtein that was running on a shared Linux cluster. PredictProtein has approximately 3,000 users a month.
The scientists essentially “wanted to be able to work on the algorithms and not be bothered with IT infrastructure,” she said, noting that IBM is working with Biosof on the hardware environment as well as specific demands of some clients.
Landry-Lane said that IBM offered the university the opportunity to benchmark their application on different platforms and to develop it for maximum performance within its financial constraints. IBM will do testing on various Linux options, she said.
In order to address the concerns of commercial users such as pharmaceutical firms who might be wary of using the service over the Internet, she said the package will run on IBM’s Deep Computing Capacity on Demand, the firm’s virtual private network that can host services to run applications.
This plan also eliminates the need for Columbia to run a virtual private network while maintaining the hardware and software system, including the need to pay for power and cooling, she said. IBM will be providing power, space, cooling, and IT services to support Bi-Sof.
She added that the partners are also exploring whether IBM’s Cell Broadband Engine can be used to accelerate components of the PredictProtein algorithm.
The Cell Broadband Engine, jointly developed by Sony, Toshiba and IBM, is a multiprocessor with high-performance features. It combines a two-way simultaneous multithreading PowerPC processor core with eight DSP-like SIMD processing units that can handle compute-intensive operations and a high bandwidth-memory subsystem.
It was initially developed for Sony’s Playstation 3 game console, but it is being explored for other uses, such as bioinformatics. These processors are at the core of Roadrunner, the IBM machine that recently broke the petaflop barrier and tops the list of the world fastest supercomputers.
IBM scientists Kathy Tzeng and her colleagues have been exploring the use of computationally intensive applications with this processor for such applications as hidden Markov model-based protein profile searches. She and her team have modified HMMer, an implementation of an HHM-based protein profile search for the Cell Broadband Engine architecture.
For the Columbia collaboration, Tzeng is looking at the most computationally intensive and time-consuming functions of PredictProtein and exploring putting them on the Cell, said Landry-Lane.
“In an IBM Blade Center you can insert Cell blades and Intel or AMD blades as a single system,” so some of the jobs can be routed to Intel or AMD blades while other could be routed to Cell. “Some of the exploratory work we are looking at for Biosof is, ‘Can the Cell processor be used for parts of their algorithm to be accelerated?’’” she said.
Preliminary studies have shown “considerable speed-ups” over other chips, Landry-Lane explained.
Rost told BioInform that he released the first ProteinProtein server 16 years ago at the European Molecular Biology Laboratory, “and wondered then about what we might need in 10 years.” In that time frame he said, hundreds of thousands of users have used the tool.
PredictProtein offers algorithms to help to predict protein structure and function and to study such aspects as binding and active sites, subcellular localization, sequence motifs, fold recognition, secondary structure, and solvent accessibility.
Oxford University Press to Launch New Journal on Biological Databases
Oxford University Press said it plans to launch a new online-only publication that will provide a platform for “novel ideas in database research surrounding biological information” and also “aims to strengthen the bridge between database developers and users.”
The journal, entitled Database: The Journal of Biological Databases and Curation, is scheduled to launch in January 2009.
Computational biologist David Landsman is the journal’s editor-in-chief. He is at NIH’s computational biology branch and studies gene expression, protein modeling, and database design. He stressed that this journal is an outside activity not affiliated with NIH.
“Database is a very exciting new journal project as it will fill a void in the peer-reviewed literature for articles solely about biological databases and the tools associated with them,” he told BioInform in an e-mail.
He said he expects the journal to cover a range of projects, including subsets of chemical databases that contain data about drug tests, or even databases of nanoparticle compounds being tested for drug delivery. “Furthermore, we will also be accepting articles about the content of databases and how to streamline the pipelines for maintaining accurate curation and links for the discovery process,” he said.
Associate editor Francis Ouellette of the Ontario Institute for Cancer Research told BioInform that he is “happy about this new journal, as there [are] just not enough outlets for this subject.”
He noted that the journal will only cover open-access databases.
Oxford University Press bioinformatics journals editor Claire Bird said the idea for the journal originated with Richard Roberts, the editor of Nucleic Acids Research, and Alex Bateman, the editor of Bioinformatics, and that it is partially an outgrowth of the popularity of NAR’s annual database issue.
Papers in Database “will be more comprehensive descriptions of the databases than we publish in the NAR database issue,” she said. That will include methodologies and technical notes on issues surrounding database development.
Typical readers will include developers as well as users in the biological sciences community. “So we will be making sure they will be accessible to biologists,” she said.
The journal will also publish reviews to help biologists decide which resources to use for a given question, as well as tutorials. “We are thinking openly about the structure of articles and the functionality of a journal to serve different communities,” she said, noting that the online-only format removes many of the constraints of a printed journal.
CLC Bio Releases Benchmarks for Genomic Workbench
At ISMB CLC bio released version 1.1 of its software suite Genomic Workbench, as well as a whitepaper outlining benchmarking results for the software suite.
The company said that in a comparison against the Maq and Soap programs, its alignment program used less time and memory on two data sets — 8.5 million reads and 86 million reads — against a whole human genome sequenced on Illumina’s Solexa platform.
According to the company, its assembly algorithm accomplished the calculation in little more than half an hour, which was at least five times faster than the closest competitor.
For the alignment of 86 million reads against the whole human genome, CLC bio’s algorithm was more than 14 times faster than other methods, the company said.
According to the whitepaper, CLC bio’s algorithm delivered results with 85 percent accuracy, compared to around 83 percent for the other algorithms, and never required more than 8 GB of RAM.
CLCbio is in the process of releasing command line assembly programs for the Genomics Workbench, a software suite for analyzing and visualizing second-generation sequencing data. By August the company plans to offer cluster-computing support for command line assembly.
Jan Lomholdt, vice president of global sales at CLC Bio, told BioInform that the bioinformatics field has “changed a lot” With the advent of next-generation sequencing. “These are medium-sized labs that before never had the possibility of touching this” before, he said.
“It reminds me of the early days of the PCR machine,” added CLC Bio’s UK sales manager Darrol Baker.
Academics are still the early adopters in next-gen sequencing, said Lomholdt. “We see the agricultural part of the business is moving fast, while the pharmaceutical companies are not fast-movers, at least not yet.” Biofuel is driving the agricultural segment, he said.
Ariadne Introduces Pathway Studio Upgrade
Ariadne Genomics was demonstrating version 6 of its Pathway Studio platform, which includes an improved user interface and other new features.
David Denny, Ariadne’s product manager, told BioInform that the company improved the interface in order to lower the level of expertise required to operate the software. He added that the desktop version of the software now has many analysis facets that were formerly available only in the enterprise version of the software.
Denny said that academic researchers are an important segment of the firm’s business and that the company “doubled” sales of its enterprise system to commercial customers this year.
The company has also added a gene set enrichment analysis module based on the Broad Institute’s GSEA algorithm that runs against the firm’s database or against data that have been imported into the database.
Denny said the company has created its own gene ontology to support the gene set enrichment analysis to quickly characterize experimental data.
In addition to the software, the company has added content, he said, and “done a lot of curation on our dictionaries and improved the quality, investigated the statistical sources of error and how to correct those,” he said.
The company also announced that the Rat Genome Database at the Medical College of Wisconsin has published a rat signaling pathway diagram collection created with Pathway Studio. The pathways are available on the Rat Genome Database website.
According to a company statement, the rat signaling pathways were curated from the literature by RGD staff. Interactive diagrams linked to gene reports at RGD were created using Ariadne’s ResNet Mammalian database, which has over a million unique interactions between proteins, small molecules, and cell processes, and the Pathway Studio visualization tools.
Denny confirmed the interest of agricultural companies in the bioinformatics market. “Most of the big plant research companies are already our customers,” he said. “They use it as a starting point … and import data into Pathway Studio. It gives them an enduring asset.”
Synamatix Sees Growing Interest in Next-Gen Sequencing
Synamatix general manager Arif Anwar told BioInform that the company has seen the second-generation sequencing analysis space “go crazy in the last 12 months.”
While traditional buyers for the company’s software have been the large sequencing centers, “in the last six to nine months it’s more general institutions, universities, groups buying these sequencers or having access to data from these sequencers, smaller labs who don’t have core informatics expertise,” he said.
The Kuala Lumpur, Malaysia-based firm recently released its SynaWorks software to manage second-generation sequencing data. It helps users to analyze, mine, and visualize their data from all second-generation sequencers.
“That is important because people are using data from different sequencers and often mixing them as well,” said Anwar.
Unlike other software companies targeting the sequencing market, which focus on workflows or organizing the data “we are at the high-value end,” he said, “analyzing the data to get closer to biological answers.”
ICSB Announces New Software-Access Policy
During the meeting, the International Society of Computational Biology released a new software-sharing policy in an effort to balance the interests of its diverse membership with the need for scientists to gain access to new tools.
The policy is a revision of a statement that ISCB released in 2002 that drew criticism because it was issued without soliciting feedback from the broader bioinformatics community.
Reaching the new statement was a “complicated and lengthy process,” Reinhard Schneider, vice president of the ISCB, told BioInform. “We are very glad to have found a path to this policy.”
The updated statement, which grew out of panel discussions and an open comment period on ISCB’s blog, notes that the availability of bioinformatics software is “extremely important” to the field, and if “a researcher's software is necessary to understand, reproduce and build on scientific results, then the software should be made available.”
The ISCB policy calls for both grantors and publishers to require statements of software availability from researchers and recommends that those statements “be specific about cost, source code availability, redistribution rights (including for derived works), user support, and any discrimination among user types.”
ISCB said in the statement that “it is preferable to make source code available” in most cases, and that developers should release executable versions of their software for academic users.
However, the policy does not offer any specific software-licensing recommendations. While it notes that “open source licenses are one effective way to share software,” it adds that “no single licensing or distribution model” is appropriate for all research projects.
“We cannot mandate making source code available because that would dissolve the business models of many of our members, so finding this policy was very tricky,” Schneider said. He noted that while many academic software developers can easily post algorithms and other tools and data on the web, software vendors or drug developers cannot, in most cases, adhere to that practice.
Schneider is currently a team leader at the European Molecular Biology Laboratory’s Structural and Computational Biology Unit in Heidelberg, Germany, and was previously a co-founder of Lion Bioscience, with experience in this issue from both the academic and commercial perspective.
Burkhard Rost, ICSB president, said the policy was the result of a “struggling process” that has lasted six years.
The policy draws on the findings and recommendations of a 2003 National Academies of Sciences report called, “Sharing Publication-related Data and Materials — Responsibilities of Authorship in the Life Sciences,” which stated, among other things, that “all information that is either central or integral to the paper should be made available in a manner that enables its use for replication, verification, and furtherance of the published claims.” This guiding principle is often referred to as UPSIDE, or the uniform principle for sharing integral data and materials expeditiously.
Sean Eddy, HHMI investigator at Janelia Farm and a co-author of the NAS study, told BioInform in an e-mail that he has “pushed to get ISCB to reorganize their sharing policy to reflect the consensus of the 2003 NAS report, and to focus specifically upon software release upon scientific publication.”
He added that the society has “partially done that” in the new statement.
“I still think it could be clarified further, but it's a good step in the right direction,” he said.
Jason Stajich, who leads the Open Bioinformatics Foundation, told BioInform in an e-mail that he supports the position “that software developed for scientific research that is publicly funded should be made freely available to support the advancement of science.”
The availability of software code allows improvement and reuse on new data or adaptation to new problems and also allows the actual implementation of published algorithms to be available for inspection or improvement, he said.
ISCB’s Membership, Finances Stabilized
During a business meeting for the International Society for Computational Biology held during the conference, ISCB president Burkhard Rost reported that the financial situation of the society has stabilized.
David Rocke, the society’s treasurer, explained that when ISMB is held in Europe the attendance is generally a little higher than when the conference is in the US “within a band of 1,200 to 2,000.”
Membership has stabilized independent of ISMB attendance but the society is seeking to add value to expand the membership, officials said.
This year’s meeting is expected to help the society squeak to a break-even point by the end of the year, as it tries to expand its conferences and offer more services and support for the bioinformatics community.
Rost said that in addition to PLoS Computational Biology, ISCB is also adding Oxford University Press’s Bioinformatics as another official journal of the society.