This article has been updated from a version posted Feb. 2 to clarify pricing information and to include comments from a customer.
DNAnexus this week rolled out a cloud-based workflow to help researchers analyze and manage genomic variation datasets, amid increasing interest in the cloud from pharma and academic circles and lingering doubts about data security, according to the firm.
The variant analysis capabilities expand the functionality of the DNAnexus platform, which the company launched last year with the aim of targeting researchers analyzing next-generation sequencing data.
The company announced the release at the Advances in Genome Biology and Technology conference held this week in Marco Island, Fl., where it was scheduled to present the results of an in-house validation of the variation analysis tools performed with datasets from the 1000 Genomes Project.
DNAnexus' workflow combines a population allele frequency analysis application with nucleotide-level variation analysis to help users identify alleles and their frequencies across different populations. It also includes a query tool that provides further filtering capabilities and "Gene Info" pages that provide an overview of each gene as well as links to third-party data sources.
Furthermore, the workflow provides access to public and commercial datasets from AmiGO, the Biobase Knowledge Library, the Catalogue of Somatic Mutations in Cancer, dbSNP, Entrez Gene, GeneCards, Ingenuity Knowledge Base and Ingenuity Pathway Analysis, the Kyoto Encyclopedia of Genes and Genomes, NextBio, Online Mendelian Inheritance in Man, PharmGKB, and PubMed.
In the study — which aimed to find variants that were significantly different across two populations — the firm applied its allele frequency tool to identify variants in data from two groups comprised of 28 exomes and filtered the results to pare the list down to about 40 SNPs. Data from the company's Gene Info pages and commercial and public datasets provided information about associated genes and their roles for each SNP.
Brigitte Ganter, DNAnexus' product director, told BioInform that after focusing on applications like read mapping, RNA-seq, and ChiP-seq analysis for the launch of its cloud-based service last April (BI 04/23/2010), the firm saw a variant analysis tool as the "natural" next step based on feedback from academics — the original target group for the service — and industry, where she said the firm is noticing a growing interest in its tools, particularly from drug companies.
Although she did not provide specific numbers, Ganter said that since the launch, the firm's client list has grown "substantially."
One user, Anshul Kundaje, a postdoctoral student at Stanford University, told BioInform that the tools in DNANexus provide a streamlined pipeline for analyzing large datasets. He said the platform offers good visualization capabilities and a read-mapping tool that’s "truly probabilistic."
Kundaje is in the computer science and genetics department at Stanford and currently works on integrative analysis of next-generation sequencing data as part of the Encyclopedia for DNA Elements consortium.
Although he hasn’t used the new variant analysis tool extensively, Kundaje said he has run the workflow on about 60 open chromatin datasets to identify variants that are specific to various human cell lines and that the platform was able to process the data "flawlessly."
Ganter believes that pharma's interest stems from the same reasons academic researchers are converting to the cloud — a desire for a more "cost-effective solution" that doesn't require an investment in software and hardware, along with other overhead costs associated with setting up and maintaining in-house infrastructure.
However, although initial skepticism seems to be on the decline — a trend that Ganter attributed to better education about the cloud and what it does — there are still a few misgivings, particularly around data security for things like clinical data.
"It is very secure ... we can [encrypt] it," she emphasized. "But we also tell the researchers [not to] put data and information in the cloud that does not need to be there, such as clinical metadata."
Stanford's Kundaje concurred, adding that switching to the cloud is "inevitable" as the numbers and sizes of datasets continue to increase and also because the maintenance and pricing for local compute clusters make the cloud a better idea.
On the commercial front, Ganter said that, in addition to the cloud's scalability, the company's pay-as-you-go model is a "big differentiator" for the firm and offers an advantage over companies with web-based offerings because it is more cost effective since users aren’t tied to yearly subscriptions.
More specifically, DNAnexus hasn’t seen any competing vendors offering a tool for population allele frequency analysis and few seem to incorporate third-party content, Ganter said.
However, as companies continue to roll out tools that target different segments of the sequence analysis market, competition is likely to heat up.
For example, last year, Cycle Computing put its oar into the life sciences market by launching the CycleCloud service, which is built on Amazon Web Services and offers users access to popular open source bioinformatics tools like Blast, GMAP, HMMER, Bowtie, and MrBayes, in addition to proteomics and molecular modeling tools, targeted at the pharmaceutical industry (BI 3/19/2010).
The firm, which also adopts a pay-as-you-go model, charges a one-time setup fee of $500 for individual researchers, $1,000 for groups, and $2,500 for departments, as well as a one-time access fee of $250, $550, and $1,250 per month, respectively. Pricing for commercial vendors is not disclosed.
Pricing for academics using DNAnexus is $20 per gigabase of raw sequence for volume users and $30 for low-volume use. The firm offers several pricing options for commercial vendors.
Admittedly, the firm hasn’t yet tapped into the open source tool space, Ganter said, which seems to be an area of interest particularly among academics, and indeed there have been efforts from some groups to get these open source tools cloud-enabled.
Kundaje added that in the bioinformatics community, not only is there a "real push" toward creating cloud-enabled open source tools, but researchers are also pushing for "good, usable" software.
"I think it's because the earlier tendency was to publish papers and occasionally provide code with it, but it [was] often unsupported and very soon unusable," he explained. "On the other hand now ... [in] big projects, especially from the read-mapping community, you can see the stress towards really usable software that is well maintained."
While the tools in its latest workflow were all developed in house, DNAnexus intends to look into adopting open source tools with an eye for "integrating the best analysis methods and tools in one place in a way that's easy to use," the firm said.
For its next steps, DNAnexus plans to provide access to more publicly available datasets, a move that Ganter said will be a major focus for the company during the first quarter, as well as to create tools for a copy number variant analysis workflow planned for release later this quarter.
Separately this week, DNAnexus was among several bioinformatics companies to join an expanded roster of Pacific Biosciences partners, which will enable it to provide data analysis and cloud computing services to users of PacBio's Single Molecule Real-Time Sequencer. The DNAnexus platform already handles data from Illumina and SOLiD sequencers.
PacBio kicked off its partner program last year in preparation for the launch of its single-molecule sequencer. At the time, the firm said that it targeted bioinformatics as a key focus in its efforts to help smooth any speed bumps that customers may face in implementing the platform (BI 02/19/2010).
In addition to DNAnexus, Biomatters, DNASTAR, Galaxy, and Genomatix Software were tapped to join the PacBio team on the sequence data analysis side this year, joining Amazon Web Services, BioTeam, CLC Bio, GenoLogics, GenomeQuest, and Geospiza, who all hitched their wagons to PacBio's last year.
Other new members in the partner program include Beckman Coulter, Hamilton Company and Tecan Group, who signed on to provide automation and microfluidics products; Covaris, Diagenode, Digilab, and Microsonic Systems, who were selected to provide tools for shearing and fragmentation; and Sage Science, which will provide size-selection tools.
PacBio now has a total of 24 companies in its partner program.
Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.