This story has been updated to incorporate new information from Amazon and to clarify which AWS service follows the FHIR standard.
CHICAGO – Amazon Web Services (AWS) on Tuesday introduced Amazon Omics, which the cloud computing giant described as a "purpose-built managed service" intended to help bioinformaticians and biomedical researchers store, query, and analyze genomic, transcriptomic, proteomic, and other omics data in order to advance scientific discovery and develop new diagnostics and therapeutics.
Announced at the annual AWS re:Invent developer conference in Las Vegas, Amazon Omics manages the cloud infrastructure behind bioinformatics workflows and data pipelines. The service provides access control, audit trails, and other security-related elements of compliance with regulations such as HIPAA in the US and the General Data Protection Regulation (GDPR) in the EU.
In a video, Amazon said that the goal of the service is to "enable large-scale analysis and collaborative research for organizations to analyze omics data with purpose-built data stores, scalable workflows, and multimodal analytics."
The firm said that the new offering will save time by taking the burden of setting up cloud infrastructure, as well as of building and running extract-transform-load (ETL) data pipelines, off end users. In the context of a genome sequence, the extraction is the string of A, C, G, and T that comes off the sequencer, and the transformation is the product of a workflow engine that translates those letters into information that can be used to identify mutations.
In an interview, Taha Kass-Hout, chief medical officer of AWS, frequently alluded to taking the "heavy lift" of technical processes off the backs of bioinformaticians and researchers.
"We take on the onus of provisioning, managing, scaling, and securing the entire infrastructure," said Kass-Hout, who also serves as VP of machine learning for AWS. "Taking all that out of the equation lets scientists be scientists and work on and focus on what they [know] best, such as scientific discovery and delivering better care for patients and innovating on therapeutics and diagnostics."
Amazon Omics is made up of three components: "omics-aware" object storage for raw sequencing data; Amazon Omics Workflows for processing raw sequences from FASTA, FASTQ, BAM, and CRAM files; and Amazon Omics Analytics, which adds structure and annotations to plain text formats like VCF to simplify variant and mutation queries, according to Kass-Hout. Customers can use the pieces either individually or all together.
"That gives you the ability to do multimodal analysis of this data" through services such as Amazon Athena, an AWS service that supports querying Amazon Simple Storage Services (S3) files, he said.
Amazon Omics supports workflows written in Nextflow and Workflow Description Language (WDL). It also helps manage the provenance of workflows for regulatory compliance.
Users can combine their own data with dozens of publicly available datasets, including from the 1,000 Genomes Project and the Genome Aggregation Database (gnomAD).
The public launch follows a beta period that ran for about three and a half months, according to Kass-Hout, and had several dozen users. Beta testing confirmed to AWS that users like being able to bring in their own data to the cloud, then use Amazon's services to query large sets of omics and phenotypic data in one place, he said. The company also learned that the Amazon Omics workflow execution engine took another major burden off life sciences organizations.
To help customers save money, Amazon Omics supports two classes of data storage. The default setting is an "active" class on the AWS cloud, but the service automatically moves data that has not been active for 30 days to a lower-cost archival storage class similar to S3. Users can customize these parameters.
Amazon Omics stores data — and thus bills customers — according to the number of gigabases the platform ingests. This offers "price predictability" regardless of whether the sequences are from short-read or long-read instruments, according to AWS.
Amazon Omics works with HealthLake, a service the company launched in 2021 to provide secure, large-scale cloud storage of structured and unstructured biomedical data. Earlier this month, AWS announced HealthLake Analytics, which normalizes multimodal health data to enable machine learning-based analytics. The firm also said that HealthLake now supports the Digital Imaging and Communications in Medicine (DICOM) standard for communication of medical images and related data.
AWS is able to separate images themselves from metadata so users can query the metadata. HealthLake also follows the Fast Healthcare Interoperability Resources (FHIR) standard for managing and moving patient records, making it compatible with electronic health records and clinical decision support systems, at least in theory.
"Now you can bring your omics data, you can bring your imaging data, you can bring your medical record data, and be able to piece together an entire [360-degree view] on a patient's entire medical history [and] be able to aggregate that information and analyze it at the population level," Kass-Hout said.
Amazon Omics also integrates with Amazon SageMaker, a machine learning service platform, and supplies application programming interfaces to connect with third-party applications and vendors.
Vendor customers named by AWS include G42 Healthcare, C2i Genomics, biomedical software company Lifebit Biotech, and laboratory informatics firm Ovation. AWS said that Lifebit will have a "significantly lower" storage cost per gigabase of data, while Ovation will benefit from variant stores that will accelerate the delivery of genomic insights to researchers.
Consulting partners of Amazon Omics include BioTeam, Cloud303, Diamond Age Digital Science, Tennex, machine learning-focused Loka, and cloud management specialist PTP. Another key beta tester and now user is Children's Hospital of Philadelphia.
The new AWS service will help C2i Genomics save time and money on managing computational pipelines and securing data, according to the Cambridge, Massachusetts-based cancer diagnostics firm. "Amazon Omics allows researchers to use tools and languages from their own domain, and considerably reduces the engineering maintenance effort while taking care of cost and resource allocation considerations, which in turn reduce time-to-market and [non-recurring engineering] costs of new features and algorithmic improvements," C2i VP of Engineering Ury Alon said in a statement.
Despite Amazon's standing in cloud computing, Big Tech does not exactly have a stellar record in healthcare and life sciences. While Google and its parent company, Alphabet, have found a modicum of success with Verily Life Sciences, Google, Microsoft, and Apple have all had high-profile failed ventures that involved healthcare data. Amazon itself is shutting down its Amazon Care telemedicine service before the end of the year.
Kass-Hout struck an optimistic tone by noting that AWS has been supporting large-scale genomics for a decade for customers such as Genomics England, Illumina, and DNAnexus. He said that the next decade will be about making sense of all the unstructured omics data that is now available.
"Now you truly can have a multiomic, incredibly large-scale service that can scale as your needs vary while the infrastructure remains safe and secure," Kass-Hout said.
He added that Amazon Omics will "introduce this democratization" of data to genomics, though democratization is a common aspiration in bioinformatics. 2bPrecise — now owned by AccessDx — Sophia Genetics, and Congenica are among those that have promoted the idea, as have more than a few startups.
When asked what scientists can do with Amazon Omics that they cannot do with other, existing technologies, Kass-Hout said, "That's the challenge." He expects the differentiating factor to be Amazon's ability to take on the "heavy lifting" associated with extracting data from its raw forms, transform it into something that will fit in a database, then load it into the database. This is possible, according to the company, because Amazon Omics is a managed service rather than just a collection of open-source software.
"Now you [will] have all that data ready to query right from the output of it, so you don't have to spend thousands of engineering hours and months of work on these ETL processes," he said.
"We're on a mission [to make] machine learning and analytics as boring as possible," Kass-Hout added. "We're really opening it up to data scientists, business owners, and software developers to start really innovating in this space by giving them the right set of tools to solve … particular problems."