BOSTON – Amazon Web Services this week introduced new features for its nascent Amazon Omics service, including a collection of "pre-built" workflows, support for graphical processing units (GPUs), direct data uploads through an application programming interface (API), and streamlined variant querying and analysis.
AWS launched Amazon Omics last November. The company announced the new capabilities here Monday at the annual AWS Life Sciences Executive Symposium and has a team of AWS and Amazon Omics leaders and customers in Boston to present during multiple sessions at the annual Bio-IT World Conference & Expo this week.
The set of workflows, collectively called Ready2Run, includes pipelines from commercial partners such as Element Biosciences, Nvidia Parabricks, Sentieon, and Google-affiliated DeepMind (specifically, the AlphaFold protein prediction software), as well as open-source applications including the Genome Analysis Toolkit (GATK), ESMFold, and NF-core. The offering also includes pipelines for single-cell RNA sequencing.
Amazon Omics hosts the Ready2Run workflows so users can perform primary, secondary, and tertiary genomic analyses with a single API call. Customers pay per run, making analysis costs "predictable," according to Tehsin Syed, general manager for health artificial intelligence at AWS.
The single API call also makes it easy to embed the workflows into user processes, either with AWS-stored or locally hosted data.
"You can just call this API and directly put the data into the sequence store from that, versus going through a staging area and then doing batch ingestion," Syed explained.
Element Biosciences, which launched its Aviti sequencing system last year, is supplying its Bases2fastq workflow through the Ready2Run service.
"[Ready2Run] helps the customer not have to do so much stuff," said Rosi Bajari, staff engineer at Element.
"Certainly, every customer can set up their own infrastructure. It takes time, though, and it can take effort. And sometimes, also, it takes expertise that you may or may not have," Bajari added. "This is an easy way with a transparent cost for a user to be able to launch a workflow … in an easy, intuitive way with a little bit less work on the setup side."
Element is not using Amazon Omics directly, but rather making it available to users of its Aviti sequencing instruments and related services through the forthcoming Elembio Cloud, a hosting platform currently in beta testing.
"This capability is not really intended for internal use, but to enable our customers to generate as much meaningful information from our sequencing runs with as little effort as possible," explained Francisco Garcia, Element's senior VP for software and informatics.
That firm does not currently offer managed services for sequencing data. "When Amazon came up with this genomics managed service, it was a godsend for us because now we're accomplishing both things that customers really need," namely, services that allow them to maintain custody of their sequencing data and the ability to process the data within the Amazon accounts they already have, Garcia said.
Another part of the new Amazon Omics capabilities is the integration of AWS sequence stores with Amazon EventBridge, cloud-based software that lets users move workflow events, such as the creation of new sequences, between AWS services and third-party applications. "You could get a new sequence that you store and then you want to do secondary analysis on it, [so] you can chain all those together based on an event that's emitted from the sequence store," Syed explained.
Amazon Omics also integrates with Amazon SageMaker, a machine learning service platform.
Amazon Omics debuted at the annual AWS re:Invent developer conference last November and began rolling out in December. The cloud computing giant described it as a "purpose-built managed service" intended to help bioinformaticians and biomedical researchers store, query, and analyze genomic, transcriptomic, proteomic, and other omics data in order to advance scientific discovery and develop new diagnostics and therapeutics.
Since the launch, Amazon Omics has stored data and billed customers according to the number of gigabases the platform ingests, offering predictable pricing.
Originally, Amazon Omics customers could only bring their own "private" workflows written in Nextflow and Workflow Description Language (WDL). Based on their feedback, some of these were incorporated into Ready2Run, Syed said.
Companies already using Ready2Run include Gilead Sciences-owned Kite Pharma, which adopted the technology for scRNA-seq. Columbia University Medical Center and Fyr Diagnostics are using it, as well, according to Amazon.
Another key innovation is Amazon Omics' support for GPUs, specifically Nvidia T4 and A10G hardware. "You can use them for any [Amazon Omics] workflows," Syed said, adding that customers can also ask for GPUs to run their own workflows.
An early beta user of Amazon Omics — though not of the new features just yet — is Children's Hospital of Philadelphia (CHOP), which has been testing the service for about a year and is serving as a codeveloper of sorts.
CHOP Chief Research Informatics Officer Jeff Pennington said that Amazon Omics represents the evolution of what started out as a set of utilities that were driven by command-line interfaces that were time-consuming to generate.
The pediatric hospital had a "twofold requirement for big genomics," he said, namely data harmonization and accurate annotation.
"Underlying all of that, we're very invested in raw data management," he added, because raw FASTQ files are "very easy to end up losing track of and/or in some disorganized state."
"Amazon Omics gives us some good tools for managing those files" without losing them, Pennington said. While it does not completely automate the process into something resembling the Digital Imaging and Communications in Medicine (DICOM) standard in medical imaging, he said, this represents a step forward.
CHOP has installed, configured, and maintained workflows like GATK by itself in the past, and the process has been labor-intensive and not always reliable or scalable. "Having GATK as a managed service is a big deal, being able to pay as we go rather than having to maintain infrastructure and a configuration that can be scaled," he said.