This is the second of three stories looking at the Cancer Genome Pilot proposals selected by the NCI.
NEW YORK (GenomeWeb) – Seven Bridges Genomics will use the roughly $5.9 million contract it received from the National Cancer Institute under the Cancer Genomics Cloud Pilot initiative to add cancer-specific applications and content to the Amazon Web Services-based version of its commercial operating system, which offers tools for managing and executing NGS analysis workflows.
The contract provides an opportunity for the relatively small bioinformatics player to demonstrate the capabilities of its operating system to the cancer research market and the NCI and validates its investments in staff as well as its participation in open source infrastructure and standards development efforts, James Sietstra, Seven Bridges' president, told BioInform this week.
The company's system — which can also be installed locally on private clouds for instance — offers access to 359 open-source and internally built sequence data analysis tools and applications and 33 pre-built NGS analysis workflows. The list of pre-built pipelines includes ones for analyzing RNA-seq differential expression analysis and alignment, fusion transcript detection, whole-genome and whole-exome analysis, and more. Also available is a software development tool kit and an application programming interface that enables third-party developers to create and incorporate new methods into the platform.
The planned cancer-specific platform will include tools for tasks such as variant detection and annotation, pathway alteration analysis, alternative splicing quantification, and more. It will also include an interactive visualization dashboard and data-mining tools that will help researchers interrogate and explore their data, according to the company.
In addition to these features, Seven Bridges' developers will incorporate content relevant to the cancer research space and develop new pipelines and interfaces that will benefit this particular arm of the biomedical community, Sietstra said. To help figure out what those new applications should be, the company is asking for the research community's input and has set up a special site through which researchers can have a say in the development of its cloud, Brandi Davis Dusenbery, a senior scientist at the company and one of the lead researchers on the pilot, told BioInform. Individuals who sign up to participate will be able to suggest tools and give feedback on proposed resources as well as contribute to efforts to create standardized analysis pipelines. They'll also receive updates and can sign up to be alpha users. The company is currently accepting suggestions for its system and will soon begin proposing possible features based on the ideas it has received for the community to vote on through its site, she said.
All of the applications, the SDK, and the API developed for the cloud pilot will be open source, according to Seven Bridges. Also, the cancer research-specific version of Seven Bridges' platform will be separate from the company's regular commercial offering. It will include customized security controls so that only researchers with appropriate permissions will be able to access and compute on restricted data, the company said.
For the purpose of the pilots, participants are required to show that their systems can handle 2.5 petabytes of Cancer Genome Atlas data plus one orthogonal data type, and for the latter Seven Bridges has chosen to work with bisulphite sequencing data collected as part of the project, Dusenbery said. Moreover, users will be able to upload their own internally generated data and analyze the information in the context of publicly available data, she said.
Seven Bridges' proposal was one of three selected for the NCI initiative that aim to provide co-located computing resources and storage space that will make it easier and much less costly for the community to access and use data generated by funded research projects such as TCGA. The two other groups that were awarded contracts as part of the pilots are an Institute for Systems Biology-led team, which received about $6.5 million that it will use to build a system based on Google's cloud; and one led by the Broad Institute, which along with collaborators at the University of California, Santa Cruz, were awarded about $7 million to develop their proposed system. Over the next 24 months, these groups will develop their systems and open them up for testing and evaluation by both the NCI and members of the cancer research community.
Not only is Seven Bridges the only commercial company to win a Cancer Genome pilot contract, compared to the other teams, it is the smallest and, arguably, lacks the same depth of experience and resources that are available at more established research organizations like ISB and the Broad. But the company believes that it is equal to the task, saying that it has invested in the necessary and appropriate human and infrastructure capital over the years to deliver a platform well suited to the cancer research community's needs.
"Since 2009 the focus of Seven Bridges has been for web-scale genomics analysis, and this is about as great an opportunity as a company like ours could have to analyze data and facilitate research," Sietstra said. "Cancer researchers need to extract biologically and therapeutically relevant information from a rapidly expanding corpus of data [and] this is an enormous challenge. We are uniquely positioned because of our experience with tools, workflows, standards, and cloud infrastructure to facilitate that research in the cancer community."
For this project, Seven Bridges will rely on the expertise of the 70 software engineers, bioinformaticians, and genomic scientists it has in house, as well as experience gained from active participation in multiple open-source development projects and large-scale community initiatives such as the Rabix Initiative, which offers an open source framework for wrapping pipelines, and Global Alliance for Genomics and Health, Sietstra told BioInform. Also, "we're a scrappy company," he said, pointing to the fact that Seven Bridges' proposal had the smallest price tag of the three winners. The lower cost, he said, "is an advantage we can offer because we are focused and highly efficient."
There's a business benefit for the company here, as well; being selected for the pilot is an opportunity to showcase the strengths of its platform to a significant swathe of the biomedical research market. "This is a game-changing opportunity for a company like Seven Bridges because our goal is to work with the world’s largest bioinformatic datasets," Sietstra said.
The NCI has said previously that it could choose one of the three pilot platforms or combine components from more than one system into a final product or that it might not select any of the platforms at all. However, if its platform is chosen in its entirety following the evaluation phase, it's not clear how Seven Bridges, being a commercial company, will handle access to the cloud infrastructure and its tools. If that happens, Sietstra said that the firm would work with the NCI to figure out the most cost-effective mechanism of providing the resource to the community and scaling up as needed.
"The last thing we want is for it to ever be prohibitively expensive," he said. "Although we are a commercial organization, we want to lower the barriers to entry and for this to be as cost effective as we can possibly make it."