NEW YORK – Look for a recent effort to improve cloud-based omics analysis and Nvidia is probably involved.
The Santa Clara, California-based company, widely known for its graphics processing units (GPUs) and other high-performance computing hardware, became a key player in bioinformatics software when it acquired University of Michigan sequencing analysis spinout Parabricks in 2019. The startup had been a partner of Nvidia since its 2015 inception.
This year, Nvidia released version 4.1 of Parabricks, which features a new retraining tool for DeepVariant, Google's deep-learning-based variant caller for germline variants. Nvidia has also accelerated DeepVariant on its GPUs, making it both faster and easier to incorporate new medical knowledge into variant calling, according to Jason Fenwick, an Nvidia business development specialist in genomics.
Nvidia has built a notebook to walk users through the retraining process, provisioning the proper cloud hardware for the specific task. "That ability to retrain makes it more accurate on specific data, whether that's different sequencing technology or it's lab-specific protocols," Fenwick explained.
The GPU acceleration allows researchers to test out new iterations faster. "You're more likely to get to a higher accuracy model quicker, which at the end of the day means you'll most likely end up with a more accurate variant caller," he said.
Customers including Regeneron Pharmaceuticals have been asking for DeepVariant retraining capabilities "for a while," said Rory Kelleher, Nvidia's director of healthcare and life sciences sales for the Americas.
The next step with DeepVariant retraining is to incorporate additional Nvidia artificial intelligence training and expertise into the process for further acceleration, according to Fenwick.
But this is not the only thing Nvidia and Google are collaborating on. Nvidia was also a launch partner for the Google Cloud Multiomics Suite, which debuted in May.
Both Nvidia and Google are partners with Form Bio, and both Nvidia and Form Bio work with Colossal Biosciences, the George Church-founded company known for its efforts to "de-extinct" long-gone species like the wooly mammoth and the Tasmanian tiger. Dallas-based Form Bio, a 2022 spinoff from Colossal Biosciences, is providing access to its computational life sciences platform via Google's Multiomics Suite.
Colossal Biosciences is using Nvidia GPUs to accelerate alignment and variant calling as well as Parabricks workflows for genomic analysis on top of Form Bio's platform, according to Fenwick. The GPU infrastructure sits on Google Cloud.
Ready2Run is a collection of about 35 different workflows to make it easier for people who don't have in-house bioinformatics expertise to get started with genome analysis.
The collection includes 13 Parabricks workflows, including for germline and somatic analysis, alignment from FASTQ to BAM files, one that replicates Genome Analysis Toolkit results on GPUs, and another that accelerates Google's DeepVariant variant caller with GPUs.
"Part of the challenge with bioinformatics tools in the past was, you needed a particular set of infrastructure. You needed software. Sometimes it would take weeks to organize all these things," said Kelleher. Ready2Run, as the name suggests, can be deployed with minimal additional configuration.
Through Parabricks, Nvidia also works with developers who write their own workflows with tools such as Nextflow, and supports customization of Ready2Run workflows.
"It's modularized. You can use these ready-to-go workflows that we've built in Parabricks that live on Amazon Omics, or you can disaggregate these different modular components of Parabricks and stitch it together via Nextflow for unique use cases," Kelleher said.
Basic Parabricks workflows are available for free, though Nvidia also sells enterprise licenses for organizations that need help and support with custom workflows.
Amazon Omics is also now supporting two Nvidia GPUs, specifically the T4 and A10G models. Prior to the Nvidia agreement, Amazon Omics only supported slower central processing units, and Amazon said that Nvidia is the only GPU supplier that is compatible with this AWS offering so far.
AWS and Google Cloud are not the only cloud platforms that Nvidia is accelerating in life sciences.
In April, Oracle Cloud Infrastructure said that it completed a standard benchmark of Parabricks by running an entire germline pipeline on a cluster of eight A100 GPUs in a single node in just 19.2 minutes. A setup of four Nvidia A100 GPUs completed the benchmark in 32.9 minutes, while the test on four lower-cost A10 GPUs took 33.1 minutes.
This is the fastest the Parabricks genomic analysis benchmarks have been completed on any cloud, according to Nvidia.
Additionally, Parabricks 4.1 added a first-ever workflow for long-read data from Pacific Biosciences. Earlier this year, Nvidia became one of the first batch of companies to earn "PacBio Compatible" status, given to technologies that have been validated to work with that firm's sequencing platforms.
Since the Parabricks acquisition in 2019, Kelleher said Nvidia's genomics team has been focusing on working with sequencing instrument manufacturers and sample preparation firms, including PacBio, Ultima Genomics, and Roche. "As they're building better and better chemistry and building better and better sensor technology, they need a compute platform that is going to keep up with the massive throughput that they're driving," Kelleher said.
Pipeline acceleration will help manage the next data tsunami on the horizon, he said, as the industry approaches the age of the $100 genome. "A lot more of the cost of the genome analysis is going to be the [computing] associated with the analysis of this data, so it's crucial for us to accelerate these pipelines in every way that we can," he said.
"We think that now is the time for genomics applications to be living on GPUs, both in the primary analysis and secondary analysis," Kelleher said at the at the 2023 Bio-IT World Conference & Expo in May.
Pharma firms are interested in applying GPUs to single-cell sequencing data as well as spatial genomics to help build large language models to streamline their pipelines, he told GenomeWeb.
Nvidia is also trying to harness generative AI for biomolecular applications with its BioNeMo AI application framework, he said. The firm is helping customers build large language models for improving the prediction of molecular structures and functions based on DNA and RNA sequences.
Early work in this area has been for protein engineering and small molecule generation, though Kelleher said that one group he met at Bio-IT World inquired about applying this technology to single-cell datasets. "That is something that we'll have to consider," he said.