CHICAGO – In retrospect, the news last week that Nvidia has acquired sequencing analysis software startup Parabricks should not have come as a surprise.
Ann Arbor, Michigan-based Parabricks was founded in 2015 and has been working with Nvidia from the beginning. Shortly after Parabricks spun out of the University of Michigan, the company was part of Nvidia Inception, a "virtual accelerator" for artificial intelligence-focused startups that rely on graphics processing units (GPUs).
Nvidia makes GPUs, which accelerate computations. Genomics is a terrific candidate for acceleration because the datasets are so large, according to Kimberly Powell, that company's vice president of healthcare.
"When we started, we realized that in genomics, many of these applications were taking a really long time," said Parabricks Cofounder and CEO Mehrzad Samadi.
Genomics analysis is a "mixture of traditional programs and deep-learning applications" according to Samadi, making it ideal for GPUs.
Powell said that Nvidia was particularly attracted to the fact that Parabricks has traditional genomics analysis side-by-side with machine learning. The acquired company developed its own technology.
Parabricks now has more than two dozen pipelines, covering somatic, germline, copy-number, joint variant calling, and deep-learning applications, and that number is growing. Parabricks is on course to release a single-cell RNA analysis product next year.
The company won a Phase 2 Small Business Innovation Research grant about a year ago from the National Science Foundation to advance R&D on secondary genomic analysis. The grant, totaling about $875,000 including a related award from the state of Michigan, is helping Parabricks create an end-to-end platform for researchers and clinicians alike. The company said that will reduce secondary data analysis processing time for next-generation sequencing to less than an hour using the Broad Institute's GATK4 pipeline.
Samadi said that Parabricks has started generating revenue from that platform.
A collaboration with storage provider DataDirect Networks (DDN) resulted in the May 2018 release of a jointly integrated technology platform that accelerates human genome analysis by as much as 100x.
It just so happens that Nvidia has its own relationships with each and every one of Parabricks' partners. "All of their partners are completely synergistic in overlap," Powell said.
The name will live on, as the technology will be branded Nvidia Parabricks, according to Samadi. With Nvidia in charge, customers should expect better support, according to Samadi.
"We will be more focused on making better and more innovative products and then we rely on the rest of Nvidia's infrastructure for us to run the rest of the business. That helps us a lot to focus on our customers' needs," Samadi said.
"We think with Nvidia's help, we can scale what we are doing right now and help more people," Samadi said.
Parabricks started talking with Santa Clara, California-based Nvidia about a year ago, though serious conversations heated up in the last few months, according to Samadi. "It's aligned pretty well, so we didn't need to do much to convince each other, actually," he said.
Nvidia has made job offers to the entire Parabricks team. While Samadi and one of his other cofounders, CTO Ankit Sethia, have agreed to stay on board, the third cofounder, University of Michigan computer scientist Scott Mahlke, is "in conversation," Samadi said. Mahlke has never been a full-time employee of Parabricks.
This acquisition actually is part of Nvidia's second major push to tackle genomics.
Powell said that the company first took on genomics about six years ago, developing an open-source project called NVBIO, which featured GPU-accelerated implementations of the Burrows-Wheeler Aligner, as well as some other algorithms related to secondary analysis.
"I think we internalized that genomics is going to be an absolute critical element of cracking this personalized medicine code, that it has implications for drug discovery and for clinical applications, so we wanted to make a contribution," Powell recalled.
But there was not enough sequencing happening around 2013 to support the business. "The market didn't feel quite ready," Powell said.
The company did continue its work around the edges of genomics, though. In 2014, the National Cancer Institute's Clinical Proteomic Tumor Analysis Consortium and the Nvidia Foundation announced that they would provide up to $2 million in funding for the development of omics-based, data-intensive scientific tools to treat cancer.
In 2016, Nvidia joined with the National Cancer Institute, the US Department of Energy, and several national laboratories to build an AI network to support then-Vice President Joe Biden's Cancer Moonshot.
The outgrowth of that effort, the Biden Cancer Initiative, shut down this year as Biden turned to his pursuit of the presidency in 2020, but Nvidia has continued to run with the technology it developed.
In this current era of high-throughput sequencers, the time was right to try again, Powell said.
Nvidia began taking a second look at genomics about two years ago when nanopore sequencing was catching on and established an applied research team in genomics.
"It started in long-read sequencing with nanopore," Powell said. "It was AI by definition and de novo assembly is something that you could essentially achieve now with accelerated computing. Everything was a perfect fit for Nvidia to make a contribution.
"We're working on everything in the long-read sequencing space and de novo assembly and artificial intelligence.
She said that AI is "starting to get quickly infused into workflows" for base calling and variant calling.
"It's essentially AI by definition," Powell said of this third generation of sequencing. "These nanopore sequencers, in order to do accurate base calling, they have to use AI approaches because the signal that's coming out of these devices, the signal to noise is tricky to work with," she said.
To a computer, that output looks similar to a speech signal, according to Powell, in that it is essentially a wavy line. Nvidia was able to apply deep learning architectures from speech recognition and apply it to nanopore outputs.
"If you think about what the [nanopore] sequencers are doing, they're pushing a lot more data through, so they needed an accelerated computing platform," Powell said.
Meanwhile, the cost of sequencing has broken the once-magical $1,000 barrier, and countries all over the world have launched national sequencing programs.
"The clinical applications are now affordable and very desirable. This is the inflection point for us," Powell said.
A year ago, Nvidia introduced an accelerated data science platform called Rapids that Powell said can support very large-scale tertiary genomic analysis as well as personalized medicine.
"You're not only looking at the variant annotations, but you want to look at them in the context of electronic health records, medical imaging and otherwise. That's a huge machine-learning problem," Powell said.
This is another area where GPUs can shine, she said.
"Our roadmap, it's going to be everything that Parabricks has been working on, really leaning into tertiary analysis."
Parabricks historically has served research customers, but is starting to move into clinical deployments, according to Powell. Neither she nor Samadi would discuss details of the clinical work just yet.
Powell did say that speed gives Parabricks an advantage in this realm.
"Time is of the essence in the clinical world. If you want to have a turnaround time of 45 minutes versus 30 hours versus a day-plus, you're going to choose Parabricks," she said. "It's a natural thing for the clinical world to want to have a GPU-accelerated solution because our turnaround time can be less than one hour."
"Nvidia believes that genomics is going to be one of the most computationally demanding areas of healthcare," Powell said.
"The bioinformatics community has been way ahead in terms of statistics," Powell said, contending that machine learning actually is a subset of statistics.
"This is exactly the right time because sequencing has come down so much in cost and the processing and analysis and analytics on this is going to become the bottleneck, and we have the opportunity to introduce modern artificial intelligence approaches and machine learning all the way out to a tertiary analysis. I think 2020 is going to be an absolute breakout year for genomics."