Genomics data analysis provider Agilent recently announced it has enabled significantly improved processing speeds for variant-calling workflows, using its software-as-a-service (SaaS) Alissa Reporter product.
The announcement comes as genomics leaders around the world have been innovating at the intersection of biology and technology, using the cloud to deliver a step change in their capabilities — in particular, significantly reducing the time and cost associated with secondary analysis.
“Our goal is to leverage the AWS cloud and Nvidia’s GPU-accelerated Clara Parabricks software to enable faster panel-specific data analysis, resulting in shorter turnaround times and reduced costs for the markets we serve — both clinical labs and researchers,” said Kevin Meldrum, vice president and general manager of Agilent’s Integrated Genomics Division. “This will ultimately help them advance towards scientific breakthroughs that create life-changing and life-saving drugs,” he added.
Making this speed increase possible is the integration of Nvidia’s GPU-optimized Clara Parabricks genomics analysis capabilities into Alissa Reporter and running the platform on Amazon Web Services (AWS).
Unlocking New Opportunities
The move to use Clara Parabricks on AWS was driven by Agilent’s desire to widen the use cases supported by its Alissa Reporter software, to go beyond small-panel amplicon analysis and offer whole-exome analysis.
“With small panels, analysis speeds aren’t really an issue, because the files are relatively small,” said Meldrum. “However, whole-exome FASTQ files can be between 5 GB and 10 GB each. A plain GATK workflow on a single sample could take five hours using our previous, non-GPU-optimized model. When you add in the pre- and post-processing, you’d be looking at runtimes of up to 10 hours. When it comes to delivering life-impacting results, speed matters, and we needed a way of accelerating this significantly if we were to expand the software to support whole-exome analysis.”
“Moreover, when your infrastructure bill is based on time spent, as it is in the cloud, the quicker we could run our secondary analysis, the lower the cost. Being able to offer whole-exome analysis, while keeping our pricing competitive, was another reason we needed to significantly reduce these secondary analysis runtimes.”
Agilent faced other challenges that will be familiar to many in the genomics world. Given the need to deploy its application close to customers, to meet data residency requirements, minimize the networking costs associated with transferring large files, and support fast data transfers, its application was originally built to run in any cloud environment, giving Agilent the flexibility to deploy anywhere. However, this meant its infrastructure teams spent significant time configuring and maintaining multiple environments around the world. It also meant Agilent’s developers couldn’t use many of the tools offered natively by AWS. Focusing on working with AWS specifically eliminates a lot of this overhead.
Accelerating Secondary Analysis
Meldrum continued: “We looked at a number of ways to meet our customers’ needs, and ultimately chose the combination of Clara Parabricks on Nvidia-GPU-powered Amazon EC2 instances.”
“One of the big attractions of Parabricks for us was that it optimizes the industry-standard open-source tools that we were already using in our product to leverage the capabilities of GPUs. This made the transition relatively straightforward because we didn’t need to make significant changes to our codebase or upskill our teams in whole new toolsets.”
“AWS had also significantly expanded its global infrastructure footprint since we designed our previous deployment model. This meant that going all-in on AWS was now an option for us: we could deploy virtually anywhere in the world and be close to our customers. This will significantly reduce the burden on our infrastructure teams, who’ll no longer need to spend a lot of time on configuring and maintaining hosted environments. We can set up our storage, database and other necessities in a few clicks.”
Flexibility as Needed
One of the other reasons many organizations opt to run workloads in the cloud is the added agility and flexibility. With virtually limitless capacity, and the ability to add or remove resources almost instantaneously, businesses large and small have been able to all but eliminate the opportunity cost associated with having traditional IT infrastructure sitting idle, as well as the need to queue up jobs to ‘wait their turn’ at peak times.
Agilent has been using auto-scaling in AWS extensively, at times bursting from a single machine up to many hundreds of machines, and then back again, both for the GPU-optimized workflows, and those running on CPUs. “This is an important way of keeping our costs down, while also being able to respond quickly to customers’ analysis requirements,” says Meldrum.
Cutting Secondary Analysis Times From Five Hours to Nine Minutes
With the main aim of reducing both secondary analysis times and cost-per-sample, has Agilent’s move to use NVIDIA Clara Parabricks on AWS achieved this?
“A plain GATK secondary analysis workflow, which could have taken up to five hours under our old model, now completes in nine minutes — that’s over a 96 percent reduction in processing time,” Meldrum revealed. “For Agilent, the true value of running Nvidia on AWS is the end results we are able to provide our customers. We’ve been able to significantly reduce end-to-end analysis times for whole exomes from around 10 hours to less than half that time. And the operational analysis cost per sample (excluding storage and development costs), which could have been as high as $10, is now just a few dollars for a standard sample. When you’re running 10,000 samples a year, that’s an enormous saving.”
“For our customers, being able to run their secondary analysis so quickly, and at a highly competitive price, means they have the flexibility to broaden the scope of their research or run more analyses than they’d otherwise be able to, which can ultimately help them achieve better outcomes for patients and populations.”
Agilent has also found that its application development teams are enjoying various benefits resulting from the new model. Being able to run secondary analysis so quickly means it can maintain rapid development cycles and keep costs down due to its reduced infrastructure usage, which is up to 10 times less than it would otherwise have been. This means Agilent can be highly responsive in its delivery of new capabilities to customers, while maximizing the proportion of its development budget available to spend directly on the product, as opposed to infrastructure.
For genomics organizations looking to follow Agilent’s lead, AWS and Nvidia offer a suite of infrastructure and services to support organizations looking to run their genomics analysis workloads in the cloud. With the largest global infrastructure footprint of any cloud provider, a choice of GPUs to suit different price-performance requirements, and Nvidia’s Clara Parabricks framework available on the AWS Marketplace, clinical and research organizations can begin accelerating their genomics analysis today.
Find out more:
Learn more about the AWS/NVIDIA Solution.
Attend the GenomeWebinar, “Improving NGS Data Analysis Speed and Scalability Using GPUs” on Tuesday, Sept. 27.
For Research Use Only. Not for use in diagnostic procedures.