Cycle Computing, an IT vendor specializing in cluster, grid, and cloud computing, this week launched CycleCloud for Life Sciences, its first offering targeted at a vertical market.
The platform offers researchers web-based access to cloud-based CPU clusters along with a suite of pre-configured life-science algorithms. Users can sign up for an account through the company's www.CycleCloud.com site, and then "spin up" a cluster in the Amazon Elastic Compute Cloud environment, Jason Stowe, CEO and founder of Cycle Computing, told BioInform.
Once the EC2 cluster is up and running — a process that takes about 10 minutes — users can run a selection of pre-installed bioinformatics, molecular modeling, and proteomics applications.
"A substantial amount of work was involved" in deploying those algorithms on the platform, "and that's partially the point," Stowe said. "Our philosophy is that it should be just like using Gmail — users don't have to know how to make that scale over a large number of servers. So we try to do that effort-intensive work up front on the applications so that the end users don't have to."
The company's business model is based on support fees, so users who don't require support can access any of the applications via CycleCloud for only the cost of Amazon's charges for storage, network, and CPU utilization.
"If you don't care about certain forms of support or certain forms of encryption — and a lot of academic environments and research labs fall into this category — then it's essentially a pass-through," Stowe said. "We make no money there. It's a free service on that basis. "
Nevertheless, the company has targeted a large opportunity in the life science market for researchers who would like to run cloud-enabled applications with a bit of support, as well as for groups that might tap Cycle to help them implement new algorithms for the platform.
"We make money off of administration — outsourcing the IT admin that you'd normally need if you bought your own cluster," Stowe said.
The company was founded in 2005 with an initial focus on deploying the Condor parallelization framework on internal clusters and grids. Cycle has a presence in several vertical markets and began working in the life-science sector around two years ago.
Stowe said that Pfizer uses CycleCloud for molecular modeling and that Cycle has worked with Eli Lilly to migrate around 10 life-science applications, including Blast, to EC2. He said that Cycle has also partnered with Schrodinger to cloud-enable some of its molecular modeling applications.
"The life-science market seems to be adopting the cloud at a much faster rate than other industries, and I think part of that is due to the fact that large pharma in particular is comfortable with outsourcing everything they do that is not part of their core business," Stowe said.
He added that the company targeted the life-science sector as the first for the CycleCloud platform "because that's where we think we'll have the most benefit up front, and that's where we were seeing the most demand."
While the firm's focus in the life-science market to date has been focused on big pharma, it expects CycleCloud for Life Sciences to expand its presence among academic and non-profit users.
"The goal is to enable academics and government researchers to not have to worry about the overhead of administering a cluster, and in many cases those folks don't need support," he said. "That's why we provide the at-cost model, so that if someone is evaluating this, we can give them an option that doesn't require them to do programming and at the same time doesn't charge them anything over what they'd be charged otherwise. We're trying to offer a good value proposition there to grow the community of users."
Ultimately, Stowe said that the company hopes that expanding its at-cost user base among academics will translate into service projects to deploy more life-science tools onto the cloud, which, in turn, should attract even more users.
"We're hoping that by working with universities in that way we can increase the number of open-source apps and workflows and pipelines. It's a great way for us to get a window into what applications we should be supporting, and ensuring that the engineering is done well to make those work inside of an Amazon environment," he said.
Stowe said that the company is currently working with some academic groups on algorithm migration, but those efforts are too early-stage to disclose their identities.
[ pagebreak ]
Why Use a Middleman?
CycleCloud currently includes Blast, GMAP, HMMer, MAQ, Bowtie, RMAP, MrBayes, OMSSA, X! Tandem, Gromacs, and Schrodinger's molecular-modeling stack.
Several research groups are already working on porting some of these algorithms to the cloud. For example, the Medical College of Wisconsin and Insilicos are both working on projects to deploy several proteomics applications, including OMSSA and X! Tandem, to EC2 [BioInform 4/24/2009].
In addition, many bioinformatics groups are working on internal projects to migrate existing tools as well as new algorithms to the cloud environment.
Stowe acknowledged that informatics teams working on these types of projects might not require a third-party portal to serve as an entry to EC2.
"There will be folks who want to do everything on their own, and duplicate the effort, and that’s completely reasonable," he said. However, "we feel that from a convenience perspective we can offer a lot of value … by centralizing. We can have one set of people worry about the administration of these clusters, so in aggregate we can save people a lot of effort because they don't have to worry about that."
Stowe noted that many groups that choose to implement Amazon Machine Images, or AMIs, to run their own tools fail to make necessary updates after the initial deployment.
"There's a common phenomenon that happens, and that's image atrophy, where the AMI … stagnates," he said. "From a logistical standpoint, there are things that need to be done to those images on a periodic basis if you're concerned about operating system security updates and things along that line — essentially patching the environment so it remains up to date with the OS."
In addition, he said that Cycle has automated a number of steps in its bioinformatics workflows to help end users with quality control and performance issues. For example, the company wrote a set of scripts to automatically detect errors in Blast runs involving large numbers of queries that can crash the program. Once the code detects an error, it performs a binary search to find out which query or queries caused the crash, and then informs the user about the error in a report.
"Normally that's something users would have to do on their own," Stowe said. "They'd have to [have] Blast crash and then figure out how to detect it, and then implement the binary search stuff themselves, and that can be a little bit of an effort in terms of process."
Not Just Sequencing
The company launched CycleCloud for Life Sciences this week at Cambridge Healthtech Institute's Xgen Congress, a conference on next-generation sequencing.
Stowe said that Cycle views the sequencing market as a prime opportunity for the company and saw "a lot of interest" in its offering at the conference. "There's definitely a lot of growth in this area, no doubt, especially from a cloud-processing perspective."
In addition to running analysis applications through the cloud-based platform, Stowe said that he sees an opportunity for cloud-based storage of next-gen sequencing data on Amazon's Simple Storage Service, or S3, framework.
"Pushing the results of genome sequencing into S3 makes a lot of sense from a backup point of view. It's very straightforward to access, [and] it gives you a guarantee that this data, which is very expensive to produce, is [secure in another] data center."
In addition, he noted that cloud-based storage can come in handy as new sequence-analysis algorithms come online. "If your data is already up in the cloud, from a backup perspective in particular, doing a re-analysis with a new algorithm is trivial," he said.
Furthermore, a number of sequencing vendors have embraced cloud computing as a promising solution for their customer's data-management challenges. For example, Illumina, Life Technologies, and Pacific BioSciences have all launched informatics partnership programs that entail some level of cloud-based infrastructure [BioInform
2/12/2010, 1/27/2010, 2/19/2010].
Stowe said that Cycle is in "various stages" of working with "several" instrument providers, but declined to name any vendors.
He said that the company has been working with these firms for the last nine months or so on pipelines that will allow customers to "run their own secondary analysis as a service — to do their own assembly and alignment on an infrastructure like this."
But sequencing isn't the only growth area in the life-science market that Cycle has identified for its platform.
"Molecular modeling is another area where you're going to see a lot of use cases around the cloud because it has the right mixture of compute to data," Stowe said. "It's very compute-heavy relative to the amount of data that it takes to run the calculations."
In particular, Stowe said that molecular modeling applications are well suited to the cloud infrastructure because they are linearly scalable. "So if you have millions of compounds that you're throwing at a target to try and find a potential drug candidate, having 10 times as many resources means your job runs 10 times as fast."
He added that proteomics is "another interesting area" because a lot of proteomics groups have old data that needs to be reanalyzed as new algorithms are developed. "We have some folks in that area where we're trying to help them run a couple years of back data and we're doing a combination of Condor as an internal resource harvester [and] cloud computing," he said.