The presence of cloud computing in genomics has solidified over the last few years to the point that it is safe to say that it's here to stay. Increasingly, entire sessions at conferences are being dedicated to best practices for cloud computing and to presentations by academic researchers who have developed bioinformatic analysis solutions specifically tailored for the cloud.
In an effort to stay competitive, sequencing platform vendors have begun offering cloud computing services that are integrated with their platforms. Their marketing pitches promise easy access to the cloud with no IT headaches.
But more than just keeping up with the times, adopting cloud computing could allow sequencing technology vendors to better secure their market share. Last October, Illumina launched BaseSpace, a cloud service built into its MiSeq sequencing platform through an Ethernet connection. Illumina describes BaseSpace as a simple way to access the cloud. With just a few mouse clicks, users can upload their sequences to Illumina's cloud — hosted by Amazon Web Services — for free access to data management and analysis software.
According to Illumina Senior Vice President Alex Dickinson, there has been an increasing amount of interest from current and prospective customers since the launch of BaseSpace. And in May, Illumina announced the launch of BaseSpace Apps, a bioinformatics applications resource for its cloud computing service, modeled after the software ecosystem Apple has developed for its devices and App Store. With such an ecosystem — made scalable by the cloud — Illumina hopes to disperse its applications to all of its customers. In addition, developers in the Illumina ecosystem can create new apps that can then be hosted in BaseSpace Apps where they can be shared with other customers.
"While I can't speak to specific numbers yet, the user uptake has been excellent and a very significant percentage of the MiSeq users are starting to upload to BaseSpace," Dickinson says. "The big steps for us around BaseSpace this year — we are working on integrating applications to the app store. There is a value proposition around doing both storage and sharing and quick access to main informatics tools, which is the next big step."
The motivation for coupling MiSeq to a cloud computing service comes from the target market group for the platform, which is primarily comprised of users new to sequencing, or small core labs. Unlike large sequencing centers like the Broad Institute — which has multiple high-throughput platforms producing hundreds of gigabytes a day like Illumina's HiSeq machine as well as the IT infrastructure to match — this new group is looking for a plug-and-play solution.
Life Technologies has also launched a cloud computing service, one that hosts the company's LifeScope Genomics analysis software for its 5500 SOLiD sequencing platform. In January, Ion Torrent — a subsidiary of Life Technologies — established a cloud-computing service for its customers called Ion Reporter, which will include variant-calling algorithms. This software service will also be able to handle data generated by the Ion Personal Genome Machine and Proton sequencers.
And in September, Pacific Biosciences teamed up with Cycle Computing, a cloud-computing consulting firm, to establish a service with its Single Molecule Real-Time Analysis software for its cloud-based single-molecule sequencing system. Like Illumina, PacBio's target market for this service is the independent investigator or small core lab with little in the way of storage or IT resources.
Instead of establishing its own cloud pipeline on Amazon, PacBio handed the reins over to Cycle Computing. "They've been in the industry for a while and are very good at just hooking things up to Amazon and using all the APIs, and we just didn't want to worry about that," says Edwin Hauw, PacBio's director of software product management. "They've had numerous security audits because they've been working with the pharma industry and other types of institutions and we didn't want to deal with any of that headache."
Bioinformatics service provider Eagle Genomics has also teamed up with Cycle Computing to develop a secure next-generation sequencing analysis pipeline. In February, they announced a collaboration that is being funded with a $50,000 grant from the Pistoia Alliance, a non-profit life sciences group that supports precompetitive collaborations to improve the interoperability of R&D standards.
Cloud computing specialists like Cycle Computing may become a crucial part of the cloud computing integration formula for biotechnology companies. "Because of the way the cloud is built, if sequencing platform vendors can manage to outsource that aspect of the system to have it managed externally, it becomes a very attractive proposition," says William Spooner, Eagle's chief technology officer. "What the cloud gives you is conversion of capital expenditure to operational expenditure, so you're moving things around in your budget sheet. It's really a pay-for-use argument."
The Eagle Genomics-Cycle Computing solution will launch an early-adopter program in July and some large biotech companies have already shown interest, according to Spooner. "They want to kick the wheels and see whether they can adopt this sort of thing," he says. "So there has been a great deal of interest, and, ultimately, the barrier to entry in financial terms is not so massive."
Illumina, PacBio, and Life Tech all seem to be marketing their cloud computing integration as a way to lift the burden off the user. Instead of purchasing hardware for storage and computation, PacBio aims to convince customers that the cloud can alleviate the need to have HPC resources in-house.
"We didn't want our customers to have to buy huge clusters and data storage components, so the natural thing was the cloud — people can access it without worrying about hiring an IT administrator," PacBio's Hauw says.
The company plans to keep its cloud solution in beta mode until the end of the year.
The upload issue
But what about the frequently cited criticism of cloud computing in genomics that says uploading genomes to the cloud is not practical? In about 10 days, a HiSeq can produce upwards of 600 gigabytes of data, which means another week will be spent uploading that data to the cloud over a standard Internet connection. But Illumina is already developing a workaround for this limitation by enabling its platforms to upload data during a sequencing run instead of waiting until a job has been completed.
"People talk about the upload problem, but the great thing about these instruments is that we can do the upload while the sequencing is going on — right now it's just MiSeq but from the fall of this year we're going to have the HiSeqs enabled to work with the cloud — you just click and upload," Illumina's Dickinson says.
PacBio's Hauw also says upload times are not an issue, and he is convinced of the cloud's permanence in genomics. But he is also quick to point out that larger labs and sequencing centers will maintain onsite computational hardware and that "cloud bursting" — spinning up cloud instances on the fly as needed to accommodate a temporary need for extra storage or analysis — will become the standard.
"It's going to be a hybrid thing — some people that have a data center already are going to still do that, but they still have to handle the peaks and valleys of their computational loads," he says. "Having a hybrid strategy or cloud bursting strategy would be beneficial for those guys, so the future is going to be a bit of both."