Skip to main content
Premium Trial:

Request an Annual Quote

Nvidia, Broad Institute Team on Deep Learning, Natural Language Processing in GATK

CHICAGO – Nvidia said Tuesday that it is partnering with the Broad Institute to make its Clara Parabricks GPU-accelerated software for secondary analysis of sequencing data available to the 25,000 users of the Broad's Terra data platform. Nvidia, maker of graphics processing units (GPUs) and other high-performance computing technology, also said that it would contribute a new Parabricks-based deep learning model to the Broad's Genome Analysis Toolkit (GATK) for genetic variant analysis.

Additionally, Broad researchers will develop foundational language-based deep learning models for DNA and RNA on a newly released artificial intelligence application framework called Nvidia BioNeMo. That technology is based on an Nvidia natural language processing platform called NeMo Megatron.

The company made the announcements at its Nvidia GTC event — which formerly stood for GPU Technology Conference — in Santa Clara, California, that was webcast.

Terra — codeveloped by the Broad, Microsoft, and Alphabet's Verily — offers researchers a way to connect to datasets, bioinformatics tools, and each other through cloud computing. 

Kimberly Powell, Nvidia VP of healthcare, said during a briefing with reporters that Parabricks will be available in the next update of GATK, due out next month, to help improve the accuracy of variant calling.

"With sequencing costs [per genome] now dropping from many hundreds of dollars to $100, we need to be sure that the analysis of this genomic data is as performant and as efficient as possible, and GPUs are powering that next wave of genomics," Powell said.

"Our roots are really with genomics data, but over time we've added a lot of different data types to the [Terra] platform," Clare Bernard, senior director of the data sciences platform at the Broad, said in a prerecorded message shown at GTC. "This partnership with Nvidia will create greater access to different types of analysis and bring that to a wider group of people who wouldn't necessarily have access to those sophisticated technology services.

With BioNeMo, Nvidia is bringing to biology a deep learning strategy called large language models (LLMs), in which algorithms are trained on massive text-based datasets.

"The BioNeMo framework is for researchers and developers who want to develop new, pretrained large language models at any scale and with any type of biological sequence, be it chemistry, protein, DNA, or RNA," Powell explained. She said that BioNeMo has been built specifically for life sciences to help researchers better understand molecular data.

"[In] the area of DNA, we are just at the beginning," Powell noted.

"We are entering the next wave of AI, where large language models can understand the language of chemistry and biology, and in the future, will be run end to end as a drug discovery pipeline, turning drug discovery into an information and computer science," she added. "LLMs give us a new tool to explore the infinite world of biomolecules and chemistry."

Anthony Philippakis, the Broad's chief data officer, said that LLMs help researchers and clinicians make sense of human language in medical records. "Similarly, in biology, there's another set of languages, the language of DNA, RNA, and proteins," he said. "Just as we train large language models to analyze human text, we can train these same models to analyze the language of life.

BioNeMo, part of Nvidia's Clara Discovery family of computing frameworks, already has a pretrained model for generative chemistry called MegaMolBART, the product of a collaboration between Nvidia and AstraZeneca developed on the Cambridge-1 supercomputer. Nvidia will release two protein models, ESM-1nv to predict properties of amino acid sequences and ProtT5 for sequence generation. In the future, researchers will be able to customize LLM models, the company said.

"The output of BioNeMo can be used for downstream tasks such as generating new proteins and chemicals or predicting structure, function, or reaction properties," Powell said.

Tuesday also marked the general release of version 1.0 of Monai, AI-based medical imaging software that Nvidia has been developing with King's College London for the past two years and plans to add to the Broad's Terra cloud.

Powell said that Monai is "purpose-built" for radiology, pathology, and surgical data, images, and video. "The next frontier for imaging is contributing to the innovations in minimally invasive surgery and robotic surgery," she said.