This article has been updated to correct details about features currently available in the DNAstack platform and what will be released in future.
NEW YORK (GenomeWeb) – Genomics software startup DNAstack this week unveiled its first product, a cloud-based platform that provides tools to help genetics researchers share and analyze biomedical data in the Google cloud.
According to DNAstack CEO and Co-founder Marc Fiume, the platform is designed to provide researchers in hospitals, genome sequencing facilities, research groups, clinical laboratories, pharmaceutical companies, agricultural firms, and direct-to-consumer companies with easy access to foundational bioinformatics solutions as well as mechanisms to share information across sites more easily and effectively.
"We recognize data sharing as one of the most important value generators in genomics and [as] a catalyst for discovery of genotype-phenotype associations [and] for clinical trial recruitment and donor matching," Fiume told GenomeWeb. "We plan to differentiate ourselves by building technologies that enable scientific and medical communities to participate in and extract insights from global genomic networks."
This first iteration of the platform will include a module that will let users set up networks through which they can share information with colleagues. Specifically, customers will be able to share genotyping information from high-throughput sequencing experiments using beacons — local servers that third-party users can send simple queries to for information about the genomic data available at a given site — or via the application programming interface developed by the Global Alliance for Genomics and Health. Customers also have access to bioinformatics tools and workflows for processing raw sequence into lists of genetic mutations, as well as a consultation service for those who want to migrate their in-house bioinformatics workflows onto the DNAstack platform.
DNAstack is offering its first set of analysis functionalities to customers for free although users will have to pay for storage and compute costs on the Google cloud, a strategy that is in line with the company's philosophy and vision for the market, according to Fiume. "We are disrupting traditional pricing models of genomics platforms ... because we believe that platforms that enable the adoption of best practices and global standards should be free ... and the resulting data and networks will create value in and of themselves," he told GenomeWeb. "We want to be the simplest, most cost-effective, and powerful way that people can participate in genomics networks."
Next year, the company intends to release tools for running genotype-phenotype searches, additional sharing features as well as variant annotation tools that it could introduce as part of upgraded subscription tiers, Fiume said. Essentially, the base platform is intended to "substantially reduce barriers in terms of cost and complexity to bioinformatics and sharing on the cloud ... allowing us to innovate and capitalize downstream on revenue opportunities."
DNAstack, based in Toronto, began building is platform last year in collaboration with Google Genomics and with the Global Alliance for Genomics and Health (GA4GH) with the help of seed funding from unnamed investors. Fiume declined to disclose how much funding the company received. When the company initially opened its doors in 2014, its primary aim was to provide a forum that would allow researchers to share information more effectively in much the same way that social networking sites like Facebook and Twitter work, Fiume said. Social networking sites have successfully "revolutionized" the way information is shared and used on the internet and "we thought it would be a great and tremendous opportunity to apply those same forces to healthcare," he said.
Conversations between the company and researchers in industry and academia highlighted the growing need within the biomedical community for better data sharing mechanisms as well as for compute resources that could scale to handle increasingly larger dataset sizes without a corresponding rise in infrastructure costs, he said. It is a need that a number of initiatives and products launched in recent years have been established to address.
For example, Oregon Health & Science University in collaboration with Intel launched the Collaborative Cancer Cloud last year, a platform-as-a-service that helps users securely share private genomic, imaging, and clinical datasets without compromising the privacy of contributing patients. The platform uses Intel-developed technologies to remotely query oncology clinical and research datasets held by institutions that have agreed to share their information. This year, researchers at the Dana-Farber Cancer Institute and the Ontario Institute for Cancer Research signed on to participate in pilot projects aimed at testing the efficacy of the cloud infrastructure.
The GA4GH, whose standards and tools DNAstack uses in its product, was itself established to develop and implement a common framework of international technology and practice standards to support the sharing of genomic and clinical data from sources around the world. Fiume chairs the GA4GH's beacon project, which is one of several initiatives that members of the alliance have come up with to support data sharing across institutions. Also last year, the Genome Canada and the Canadian Institute of Health Research announced a C$3.3 million investment in Can-Share, a genomic data-sharing program to create policies and tools for Canadian clinicians and researchers to share data with each other and with partners worldwide — DNAstack is one of the recipients of the Can-Share grant.
Researchers need to be able "to share tens of thousands if not millions of genomes for us to be able to effectively resolve the causes of complex and rare diseases," Fiume said. At DNAstack, "we're trying to do this by writing open standards, developing enterprise-grade software that adopt them."
That decision to build on open standards was one of the reasons that DNAstack chose to build its platform on the Google cloud, according to Fiume. The Google cloud platform provides "us access to cutting-edge infrastructure with storage, machine learning APIs, and search APIs," but also, "they've made a commitment to the open standards developed by the GA4GH," he said. For example, Google Genomics offers an implementation of the GA4GH API that is optimized to run on the Google cloud. Moreover, since the solution is cloud-based, researchers can avoid the considerable upfront costs that are associated with purchasing, setting up, and maintaining in-house infrastructure compared to doing genomic analysis on the cloud, Fiume noted.
While many informatics companies include data sharing as a key component of their software platforms, Fiume believes that there is still plenty of room for DNAstack to make its mark. "A lot of in-house or commercially developed software products support proprietary APIs," he said. Other products support sharing between groups within a single organization or groups that use the same commercial platform. "This will absolutely lead us towards the existing ecosystem of [electronic health records] most of which are non-interoperable leading to fragmentation of data silos and a very difficult development environment," he said.
In contrast, DNAstack's APIs will work both within and outside of the company's platform. "Beacon is a good example," Fiume said. "Using DNAstack, users have push-button access to share via the Beacon APIs, and by virtue of that being an open standard, [they can] connect that data into the global beacon network."
DNAstack is one of a number of genomics-centric companies to launch products based on the Google cloud. Another example is Dutch bioinformatics firm InsideDNA which over the summer launched a beta version of a Google-based platform to help life science researchers reproducibly run and share computational tools for genomic analysis. That system offers access to more than 1,000 bioinformatics algorithms and programs for tasks such as RNA-seq analysis, read mapping, and variant calling as well as the Google Genomics API.