Skip to main content

Google Invests in DNAnexus, Hosts Company's Cloud-Based Sequence Read Archive

Premium

By Uduak Grace Thomas

This story has been updated from a version posted Oct. 12 to include additional comments.

Google has joined several investors in a $15 million funding round for web-based bioinformatics provider DNAnexus and is working with the company to provide a cloud-based version of the Sequence Read Archive that will be available to life science researchers without charge.

DNAnexus said this week that it has raised a total of $15 million in second-round funding from new investors Google Ventures and TPG Biotech with the rest of the funds provided by existing investors First Round Capital, SoftTech VC, K9 Ventures, and Felicis Ventures.

The funds will be used to hire new staff as well as to support product development.

In addition, Google is providing 400 terabytes of space on its Cloud Storage platform that will be used to house data from the SRA, which is currently hosted by the National Center for Biotechnology Information. DNAnexus has developed a new interface for the site and added several capabilities that aren’t in the NCBI version of the resource.

DNAnexus isn't disclosing financial details of the hosting agreement with Google.

NCBI earlier this year decided to phase out the SRA as a result of budget cuts (BI 2/18/2011). The institute later said that it will continue to host a "subset" of sequencing data, such as data from RNA-seq, ChIP-seq, and epigenomic studies that are submitted to Gene Expression Ominibus; genomic and transcriptomic assemblies that are submitted to GenBank; genomic assemblies to GenBank/WGS; and 16S ribosomal RNA data associated with metagenomics that are submitted to GenBank (BI 6/17/2011).

NCBI's David Lipman told BioInform via e-mail this week that NCBI's plans for the SRA haven’t changed and that the institute is "happy that DNAnexus and Google are providing alternative access to the subset of SRA that is available without restrictions."

New features that DNAnexus has added to its version of the SRA include the ability to download FASTQ files; a new web-based interface for searching and accessing datasets; tools for users to import SRA datasets into the company’s commercial platform to access additional functionality such as mapping, RNA-seq, ChIP-seq, variant analysis, and data visualization; as well as tools for integrating SRA data with their own sequence data.

While users will be able to access the SRA data without signing up for an account, DNAnexus CEO Andreas Sundquist said that researchers who want to import data into DNAnexus to make use of the company's online bioinformatics services — which run on Amazon's cloud architecture — would be required to register.

As part of the launch, DNAnexus has reduced its standard academic pricing by half — to $10 per gigabase of raw sequence for volume users from $20 per gigabase. The company will also allow researchers to import SRA data into DNAnexus for free through mid-November, Sundquist said.

Sundquist told BioInform that his company plans to add new capabilities that will allow researchers to submit their sequence data directly to the DNAnexus SRA site.

He explained that the company will be able to maintain SRA's open source model with funds from its current data analysis and management business, which he said has had "good traction" from academic, pharma, and biotech customers.

Additionally, the firm’s partnership with Google will help DNAnexus maintain its version of the SRA at a fraction of what it would cost NCBI to host the data, Sundquist said, although he did not provide specific financial details.

Sundquist said that the company has a headcount of 25 and it hopes to double that number "as quickly as possible."

The company is particularly looking add a number of software engineers to its employee roster, he said.

Private Sector or Not?

DNAnexus's hosted version of SRA suggests an alternative to publicly funded databases that are facing an uncertain future in the face of potential budget cuts, but some researchers questioned the viability of such a model.

In an e-mail to BioInform, Steven Salzberg, a professor of medicine and biostatistics at Johns Hopkins University, wondered how a for-profit company would benefit from hosting a free sequence database, though he admitted that he isn't aware of the specific details of DNAnexus' deal with Google.

Nevertheless, Salzberg said he doesn’t think commercial firms present a viable survival option for large-scale databases that are struggling to stay afloat.

"In general, I don't think we should rely on the private sector to maintain vital scientific resources such as public DNA sequence databases," he said, pointing out that companies may fold or change their business models, leaving users in the lurch.

As an example, he noted that Celera Genomics initially set itself up as a database provider at the height of the Human Genome Project, but soon "got out of the sequencing business entirely, so it's a good thing we weren't depending on them."

David Dooling, assistant director of informatics at Washington University's Genome Institute, told BioInform in an e-mail that if DNAnexus plans to accept submissions eventually, its version of the SRA "certainly appears to be a viable option as long as they adhere to access and usage policies specified by the funders."

However, accepting data on human subjects may be a different matter, Dooling said.

"I would think it will be difficult for them to be approved to host protected-access data," he explained. "Now that the majority of sequencing is on human
subjects, this comprises the majority of data that would need to be submitted."

Ultimately, "the long-term viability of any of these data repositories depends on their costs and the service they provide," he said. "Will any of these data repositories last? Simply put, if people are willing to pay for it, yes. If not, no."

He also expressed concerns about DNAnexus's willingness to maintain the resource in the long run if the company does not see a return on its investment.

"It seems to me that DNAnexus is treating this as a loss leader to get people to use their analysis platform," he said. "If that doesn't work, they probably won't do this for long."


Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.

The Scan

Possibly as Transmissible

Officials in the UK say the B.1.617.2 variant of SARS-CoV-2 may be as transmitted as easily as the B.1.1.7 variant that was identified in the UK, New Scientist reports.

Gene Therapy for SCID 'Encouraging'

The Associated Press reports that a gene therapy appears to be effective in treating severe combined immunodeficiency syndrome.

To Watch the Variants

Scientists told US lawmakers that SARS-CoV-2 variants need to be better monitored, the New York Times reports.

Nature Papers Present Nautilus Genome, Tool to Analyze Single-Cell Data, More

In Nature this week: nautilus genome gives peek into its evolution, computational tool to analyze single-cell ATAC-seq data, and more.