This article has been updated from a version posted July 6 to include additional information from IBM and MU
The University of Missouri and IBM announced last week that they will develop what they call a "first-of-a-kind” cloud computing environment for use in genomics research and personalized medicine.
The three-year research project was made possible by an IBM Shared University Research Award. The award program is designed to, among other things, increase access to IBM technologies for research and in university curricula.
The partners said that the cloud will allow sharing of bioinformatics resources among universities and institutions, which could potentially lead to a "life sciences corridor" across Missouri, Kansas, and throughout the Midwest.
Gordon Springer, an associate professor of computer science at MU, said the process began last fall. “A group of people from the university went to IBM and discussed the kinds of things that the university was doing and the kinds of things that IBM was interested in,” he told BioInform.
Springer said that after the discussion he worked with IBM staff to write a proposal that was submitted in February. IBM approved the proposal in April.
“Basically, [the grant] is for equipment that we would use for bioinformatics activities that are along the lines of what we have been doing at the university and also along the lines of what IBM is interested in doing,” Springer said.
Under the terms of the agreement, IBM is donating the IT infrastructure to the university without cost and for use in the project outlined in the proposal.
Chalapathy Neti, global leader of healthcare transformation at IBM Research, told BioInform that the company chose MU for the project because it "had competence in thinking about genomics analysis — particularly in plant and animal genomes — and we have interest in exploring compute infrastructures.”
He also described the company’s interest in cloud computing.
“IBM Research and IBM from a technology perspective have been driving towards increasing the level of self service, the ease with which people are able to provision the required hardware and software,” he said. “We are looking at extreme virtualization technologies that will allow for not only sharing of the same physical infrastructure in a multi-tenant environment but also allow for greater utilization of the underlying resources.”
Springer said that the partners will seek out bioinformatics applications "that might be out there in the cloud that could be provided as a service.” He noted that the infrastructure will be used mainly to analyze genomic sequence data but in the future the partners plan to expand into the medical diagnostics and personalized medicine field.
In keeping with the terms of the agreement, all the resources and infrastructure will be available to researchers for free for the duration of the project.
Springer said that MU has no direct plans to commercialize the offering, but noted that it’s something that will be up for discussion. “Obviously the university is interested in self-sustaining activities and it we want to this to persist, we have to come up with a revenue stream that will allow us maintain and grow the resources,” he said.
The project follows on the heels of a number of commercial ventures into cloud-based bioinformatics offerings. For example, the UK's Eagle Genomics is providing a bioinformatics service based on cloud computing for computationally demanding projects while Stanford University spinout DNAnexus offers customers direct access to cloud-based bioinformatics resources for next-gen sequence analysis (BI 4/23/2010). And earlier this year, IT firm Cycle Computing launched a cloud-based bioinformatics service called CycleCloud for Life Sciences (BI 3/19/2010).
A Three-Phase Project
The MU/IBM project will be divided into three phases. In the first phase, IBM will provide MU with an iDataPlex computing system, along with related software. The infrastructure will be integrated with the university's existing computing infrastructure to speed up the process of DNA sequencing and analysis and to collect and store massive amounts of data.
“The first stage is basically developing the sequence analysis capabilities that we are currently using on our existing clusters and implementing them onto the iDataPlex,” Springer said.
Currently, the university has two Illumina Genome Analyzers that send experimental data directly to its network. “We have somewhere in the neighborhood of five terabytes of data sent to us each week from these systems,” Springer said. “We do the initial and secondary data analysis and then subsequently the researchers have access to the data to do as they want.”
For this project, Springer said that the university will receive the fastest of the M2 versions of the iDataPlex, which is what IBM uses primarily for its own cloud computing offering.
In addition to the iDataPlex system, Springer said that IBM will provide 24 terabytes of data storage that will be linked to the university’s existing infrastructure.
In the second phase of the project, the partners plan to create a prototype cloud computing environment for genomics research.
Springer said that the partners plan to bring together bioinformatics resources from other universities and institutions along the I-70 corridor.
“The intent is to be able to pick up resources from St Louis, Columbia, Kansas, and Kansas State University so that the resources that are available at each of those locations can be married together and we will have a regional computational environment for doing bioinformatics,” he said.
In the final phase of the project, it is expected that the cloud will be fully operational and managed by MU staff.