NEW YORK (GenomeWeb) – DNAnexus said today that the Data Coordination Center for the National Institutes of Health-funded Encyclopedia of DNA Elements (ENCODE) project has selected its cloud-based bioinformatics platform to handle data analysis and sharing for the third phase of the effort.
The DCC, which is at Stanford University, serves as the central hub for handling and processing raw sequencing data collected from the 14 biomedical institutes across North America that are involved in the ENCODE project, which aims to comprehensively catalog all the features of the human genome and provide a foundation for studying the genomic basis of human biology and disease.
Researchers in the DCC chose DNAnexus' platform because it supports collaboration and provides a scalable environment for processing thousands of datasets, the company said. The system was also selected because its supports transparency, reproducibility, and provenance for ENCODE pipelines which will help ensure clear and consistent results.
The researchers have already implemented and optimized ENCODE's bioinformatics pipelines to run on the cloud and have begun using them to analyze data from the project, according to DNAnexus. It's expected their analysis will require 10 million core-hours of compute and will generate nearly 1 petabyte of raw data over the next 18 months on the DNAnexus platform.
The partnership will provide "secure and immediate access and use of ENCODE's results," DNAnexus CEO Richard Daly said in a statement. "We believe the availability of the consortium's gold-standard analysis pipelines and ENCODE data on a single integrated platform will accelerate genomic medicine."
The ENCODE pipelines are available in a public project on the DNAnexus platform and also on the GitHub repository.