NEW YORK (GenomeWeb) – DNAnexus has implemented its genome informatics and data management platform and services on Azure, Microsoft's enterprise cloud computing environment, and said that Stanford University's Center for Genomics and Personalized Medicine (SCGPM) is its first customer for the integrated offering.
Integration with Microsoft offers DNAnexus access to a wider pool of global customers who already have relationships with Microsoft and may be looking for infrastructure that can support their genomics needs, Brad Sitko, DNAnexus' vice president of finance and corporate development, told GenomeWeb. On the customer side, in addition to giving customers a choice of different cloud platforms, Amazon and Microsoft, DNAnexus' users will benefit from established relationships between Microsoft and various healthcare and research enterprises, he said. They will also have access to new solutions, as well as improved versions of existing bioinformatics tools, that were developed within Microsoft Research.
This includes, for example, improved versions of two commonly used algorithms in bioinformatics, the Burrows Wheeler Aligner and the Broad Institute's Genome Analysis Toolkit. Intel also offers optimized versions of these and other algorithms that are tailored to run on its Xeon processors. Earlier this year, the Broad announced partnerships with Microsoft and other vendors to implement the current version of the GATK software on their respective cloud platforms later this year. David Heckerman, a distinguished scientist and director of Microsoft Genomics, told GenomeWeb this week that his team has been able to accelerate both BWA and GATK by a factor of seven, slashing the time it takes to run them on standard machines from around 28 hours to under four hours. DNAnexus has been testing the algorithms with Microsoft and plans to make them available for use on its platform, Sitko said.
The companies are also implementing an application programming interface developed by the Global Alliance for Genomics and Health for bulk data streaming on the Azure cloud that will be available to DNAnexus platform users, Heckerman said. Users will also have access to an improved algorithm, dubbed Fast Liner Mixed Models (Fast-LMM), for quickly identifying associations between genetic markers and traits and weeding out confounding variables in very large cohorts, Heckerman said. "What our team did in research a few years back was basically to discover some algebraic tricks that allow this algorithm to run in linear time, so now instead of only being able to do a GWAS of 1000 people ... we can now process a sample size of one million or more."
Details of the Fast-LMM algorithm were published in a Nature Methods paper in 2011. According to the article, the algorithm was able to analyze data from 120,000 individuals in just a few hours. By comparison, some algorithms tested at the time were unable to process data from 20,000 individuals, the researchers claimed. Fast-LMM was also used in a study published earlier this year in collaboration with researchers at the Wellcome Trust Sanger Institute and elsewhere that looked at the heritability of a few dozen traits in a Ugandan cohort of more than 4,700 individuals. For the study, which was published in the Proceedings of the National Academy of Science, the researchers considered genomic variants as well as environmental effects in their models to account for missing heritability that creeps into heritability studies and confounds results.
Additional benefits of using the Azure cloud include access to more data centers in more areas than any other cloud vendor provides, Heckerman said, an important consideration for customers who for various reasons prefer that their data physically resides in the country where it was produced. Microsoft also offers better security and local compliance than other vendors and can more readily sign business associate agreements — which protect personal health information in accordance with HIPAA guidelines — that other cloud vendors, Heckerman claimed.
These and other tools will be available to researchers at the SCGPM, as well as to researchers at the University of California, Santa Cruz, who also use Azure for their genomics projects. Initially, the Stanford center, which supports researchers from nearly 80 laboratories, tapped DNAnexus' infrastructure on Amazon Web Services to handle its genomic data analysis and management needs. Separately, Stanford researchers worked with Microsoft Research on projects that explored issues with running genome-wide association studies, and as a result, ran some genomic data processing and management on the Azure cloud. Now that the platforms have been combined, those collaborations will be able to proceed in a more seamless and efficient fashion, Sitko said.
Other companies that have implemented their informatics products on Microsoft Azure include Finnish bioinformatics firm MediSapiens. The company said last week that it will offer its Explorer platform, which provides data curation, management, and analytics tools, on the Azure cloud to healthcare organizations in Finland. Also, South Korean-based MyGenomeBox has launched its business — providing storage resources and software applications to individuals who have had their genome sequenced — on the Microsoft cloud.
Earlier this year, Spiral Genetics and Curoverse both signed agreements with Microsoft to make their respective bioinformatics platforms available on Azure. Specifically, Spiral Genetics deployed BioGraph, its proprietary method of compressing and querying large quantities of next-generation sequencing data, on the Azure cloud, while Curoverse made Arvados, its open source platform for managing, processing, and sharing genomic and biomedical data, available on the Microsoft infrastructure. Finnish firm BC Platforms also has a deal with Microsoft to use Azure to provide its genomic data management solutions to clinical researchers.
Moving forward, Microsoft intends to forge similar partnerships with other bioinformatics companies. Heckerman told GenomeWeb that the company will announce some of these integration agreements in the near future. For its part, DNAnexus is open to implementing its platform on other cloud infrastructures, Sitko said. "We will certainly follow customers [in terms of] what their specific needs are [and] we'll make the decisions that make the most commercial sense for us when presented."