During the Pistoia Alliance meeting held in Boston last week, representatives from three groups demonstrated pilot platforms for delivering cloud-based next-generation sequence data services with an eye toward the needs of pharmaceutical researchers.
Teams comprising representatives from Hewlett-Packard and the Swiss Life Science Center; Constellation Technologies and Genestack; and Eagle Genomics and Cycle Computing presented their solutions to delegates at the meeting — which was held a few hours prior to the start of the Bio-IT World conference.
The three pilots were developed in response to a request for proposals issued last July by the Pistoia Alliance Sequence Services working group. Each team received $50,000 in "shared risk" funding from five pharmaceutical companies — GlaxoSmithKline, AstraZeneca, Roche, Novartis, and Lundbeck.
The pilot platforms were developed for the second phase of the group's sequence services project, which is focused on building a fully functional platform for NGS data storage and analysis that will meet the needs of pharmaceutical R&D. In the first phase, completed last year, vendors were tasked to develop a secure infrastructure that could deliver good performance, scalability, and availability.
Under the terms of the RFP for the second phase, each platform had to use open-source tools for NGS data analysis while simultaneously maintaining the security and scalability of the platform. The systems were required to perform Blast queries on datasets and to host secure installations of Ensembl and PlasMapper, an application that generates and annotates plasmid maps using plasmid DNA sequence as input (BI 4/22/2011).
Interested vendors were also expected to provide business models that showed a commitment to support and develop the proposed systems as well as commercialization plans and pricing models (BI 7/29/2011).
The alliance selected the three projects that presented last week out of a total of 10 proposals (BI 2/10/2012).
Focus on Flexibility
Rob Gill, Constellation's chief technology officer, presented his team's platform at the Pistoia meeting. He told BioInform this week that his team focused on developing a platform that was flexible enough to allow users in pharmaceutical companies to incorporate the tools they want to use into the system.
"I don’t want to tell people what bioinformatics they should do," he said. "What we want to do is build a platform that anybody's tool can go on and thus they can do their science ... not our science."
Constellation's platform includes a log-in portal that gives users access to a variety of large-scale bioinformatics applications including Ensembl, the CellProfiler image analysis software package, and the Galaxy workflow system. The platform also includes standard NGS tools like Bowtie and FastQC, Gill said.
In addition to open source offerings, Constellation also included capabilities offered by GeneStack, a startup launched earlier this year that offers what it describes as a "secure bioinformatics data hosting and computation environment" that is optimized for next-generation sequencing results.
"Different pharmas do things differently," Gill said. "With what we've built, they can filter things ... change parameters if they want to, they can link systems and tools together in a variety of ways."
Additionally, the group focused on developing a system that is secure, flexible, and runs on a scalable cloud infrastructure, which provides users with the resources they need to do analysis, Gill said.
He added that Constellation's platform won't be "beholden" to a specific cloud provider. Instead, the firm will work with multiple vendors including Amazon Web Services, Microsoft Azure, and others.
Gill noted that each of the groups selected for the second phase of the Pistoia project had similar ideas in terms of cloud computing and workflow development.
"It's going to be an interesting market with a lot of very good players with some of them huge, like HP, and then some of the smaller [companies] like us and Eagle," he said.
Standards-Based Approach
For its part, the team comprising Eagle Genomics and Cycle Computing focused on developing a system with a "loosely coupled architecture" that enables collaboration and also encompasses the entire bioinformatics workflow — from uploading the data to analyzing and storing it — using both manual and automated command-line processes, Will Spooner, Eagle's CTO and founder, told BioInform this week.
"We went for very much a standards-based approach, reusing as much [existing] software as possible," he said. "The way we have put it together is very flexible in terms of tying these different components together using industry standard protocols and web services."
Additionally, "we have very powerful command-line access and a command-line interface to the system ... which I am not aware that the other participants did," he said.
Eagle and Cycle's platform gives users the option of sharing their data with other research partners and collaborators, but also gives them control over how much data is shared, Spooner said.
The platform contains all the tools necessary to develop high-throughput bioinformatics pipelines and is powered by Cycle Computing's Condor compute cluster, which runs on Amazon Web Services. Spooner said that the partners are currently exploring methods of deploying the solution in house as well as on other cloud infrastructures.
The team is also working on implementing specific analysis tools on the platform in collaboration with an unnamed pharma company and a university collaborator.
In particular, they are working on implementing an analysis pipeline — which will include alignment and de novo assembly software — that is intended to detect gene fusion events in RNA-sequencing data from tumor samples.
A Single Platform
HP's system, meantime, has been in use in academic settings for eight years and already included several of the NGS analysis features that were required by the Pistoia alliance, including tools like TopHat, GATK, and the Integrative Genomics Viewer, Joel Jankow, the company's health and life science business development manager for Europe, Asia, and the Middle East, told BioInform this week.
Compared to the offerings from the other vendors participating in the project, HP's system uses the company's own hardware and compute infrastructure, which means it has "good control" over the platform, he said.
Additionally, the HP offering comprises a single platform where users "can do everything they want," Jankow said. Further, "there is a very clear delineation between what we do and what our partners do, [and] that didn’t come across as clear with the other [groups]."
The platform lets users upload their data directly into the system, where they can run different analysis algorithms that are part of a workflow engine developed by HP's partner in the project, the Swiss Life Science Center.
Alternatively, users can create their own workflows within the system and import their own annotations, among other capabilities, Jankow said.
While the HP solution has already been implemented for academic research groups, the partners worked on improving its security and its quality control features to ensure that it met industry standards and the aims of the Pistoia RFP, Jankow said.
What's Next?
Moving forward, each of the participating vendors is looking to further prep its solutions for the marketplace.
Constellation's Gill told BioInform that the company expects to make its integrated platform commercially available within the next six months. In the meantime, it will offer access to its implementations of Ensembl, CellProfiler, and other tools under a software-as-a-service model.
The firm is still working out the details of a business model for the integrated platform, including how access to the platform and its tools will be priced, he said.
Eagle Genomics, meanwhile hopes to kick off an early-adopter program in July during which it will accept applications from "preferred customers," Spooner said.
It plans to launch the full platform by late summer under a pay-as-you-go model that will be based on a per-gigabyte basis.
Finally, HP intends to bring its platform to market on its cloud infrastructure in January 2013 under a pay-as-you-go model, Jankow said.
He added that HP also plans to offer bioinformatics consultancy services around the platform and to work on improving its invoicing and billing feature prior to the release. Additionally, HP plans to include functionality for handling other types of data such as proteomics, transcriptomics, and metabolomics data, he said.