CHICAGO – Nvidia is teaming with Oxford Nanopore Technologies, a branch of the UK's National Health Service, and two major pharmaceutical companies develop what the technology company said will be Britain's most powerful supercomputer.
Called Cambridge-1, the supercomputer will feature Nvidia DGX SuperPod supercomputing infrastructure, based on the company's recently introduced A100 graphics processing unit (GPU) and related DGX A100, a box that holds eight A100 processors and related server hardware to increase computational speed.
Cambridge-1 will feature 80 DGX A100 units and have 400 petaflops of computing power, making it among the top 30 supercomputers in the world, according to Nvidia officials. The Santa Clara, California-based company pegged initial costs at about $50 million.
"The Cambridge-1 supercomputer will serve as a hub of innovation for the UK and further the groundbreaking work being done by the nation's researchers in critical healthcare and drug discovery," Nvidia Founder and CEO Jensen Huang said Monday in a keynote address at the company's annual GPU Technology Conference (GTC), being held online this week.
Nvidia is expediting construction of the supercomputer in the face of the COVID-19 pandemic and expects to have Cambridge-1 online by the end of the year. The supercomputer is part of a center of excellence for artificial intelligence that the firm is building in Cambridge, UK.
Nvidia has previously highlighted how its servers were helping large-scale programs in the US and the Middle East understand the novel SARS-CoV-2 and accelerate research into potential therapies for COVID-19.
The initial Cambridge-1 partners, including Oxford Nanopore, Guy's and St Thomas' NHS Foundation Trust in London, King's College London, GlaxoSmithKline, and AstraZeneca, should start moving onto the platform in early 2021. "I think we have a ton of potential opportunities to do very large-scale industrial research, and we'll kick it off in the first half of next year," Kimberly Powell, Nvidia VP of healthcare, said in a videoconference with reporters.
"We want to give these AI researchers and scientist pioneers a supercomputer so they can do very large-scale research and we can have incredible medical breakthroughs," Powell said.
Nvidia will be working with these partners on very large-scale research. It will also invite startups and university researchers to access Cambridge-1 in an attempt to build a bioinformatics ecosystem. "Now we can offer hands-on GPU bootcamps and hackathons and competitions, so we can bring more and more practitioners, especially from the AI healthcare research space, to understand and solve problems with artificial intelligence," Powell said.
Nvidia and Oxford Nanopore have already been working together for several years. Notably, the sequencing firm's GridIon Mk1 instrument features an Nvidia Volta GV100 card, so basecalling only requires 10 percent of the GPU resource. Other Oxford Nanopore devices, including the handheld MinIon Mk1C and the larger PromethIon model, also contain Nvidia GPUs.
Powell explained that AI is at the "very core" of basecalling for nanopore sequencing. The open-source Bonito basecaller for Oxford Nanopore reads, released in February, runs on Nvidia technology, she said.
With Cambridge-1, Oxford Nanopore expects to be able to train new analytics algorithms in pursuit of continuous performance improvement.
"Many of our performance improvements over the years have been as a result of improving the analysis methods rather than improving the original data generated by a nanopore device," an Oxford Nanopore spokesperson said via email. "In short, the information is already in the signal."
As computing power grows and computational time shrinks, researchers should be able to go back to earlier nanopore sequence data and reanalyze it with the improved algorithms, the spokesperson said, adding that the supercomputer should also support an ever-greater data scale.
"Using the additional analysis power provided by supercomputers and AI to deploy to the largest datasets is particularly useful for delving into the rich data provided by nanopore sequencing, for large-scale matching of variants and phenotypes," according to the spokesperson.
In the future, this type of computing platform, coupled with new AI algorithms, should also make protein sequencing and analysis accessible to more researchers, the company said.
Also on Monday at GTC 2020, Nvidia said that it would partner with GSK to deploy the new Nvidia Clara Discovery software suite for pipeline development and computational drug discovery at the AI hub that the pharma giant is building in London. GSK is investing in an unspecified number of DGX A100 units and also will have access to Cambridge-1 to power its AI capabilities.
"We're building new algorithms and approaches in addition to bringing together the best minds at the intersection of medicine, genetics, and artificial intelligence in the UK's rich ecosystem," Kim Branson, senior VO and global head of AI and machine learning at GSK, said in a statement.
Specifically, Clara Discovery is meant to address the high cost, slow process, and high failure rate of attempts to bring new drugs to market. "We are working on how to extract information out of the new biomedical datasets that GSK is generating, things like medical imaging and genomics," Powell said.
As the cost of sequencing falls, Powell expects Nvidia customers to generate many exabytes of data in the next few years. "Genomics is an incredibly powerful data source for a very important stage of drug discovery, which is selecting your biological target," she explained. "If you don't select the right biological target, you're setting up the entire rest of the process for failure, so improving that capability is incredibly important."
Powell said that Nvidia is still trying to learn how much of the GSK partnership will focus on genomics. She said that the two companies recently ran a demonstration of interactive analysis clustering of single-cell genomic data.
"This is the kind of new workloads that we're going to be working on together," Powell said. "We're excited to explore all the areas of new biomedical data and artificial intelligence techniques to extract value out of it."
Oxford Nanopore is not the only genomics firm working with Nvidia. In a separate announcement on Tuesday, PetaGene, maker of genomic data compression software, said that it has integrated its flagship PetaSuite bioinformatics analysis platform into Nvidia's Clara Parabricks Pipelines. Cambridge, UK-based PetaGene has claimed that its PetaSuite lossless compression software can reduce storage costs and data transfer times by 60 percent to 90 percent over BAM and gzipped FASTQ files.
This integrated software accelerates genomic analysis on Parabricks Pipelines by 29 percent, according to new data PetaGene presented at GTC. The Translational Genomics Research Institute (TGen) performed this analysis and validation on behalf of PetaGene.
"At TGen, we have a long history of working with a large number of bulky genomic files. As our workflows mature and scale, we have been keen to build our genomics infrastructure from the ground up with the most efficient tools and systems available," Glen Otero, TGen's VP of scientific computing, said in a statement. "PetaGene and Nvidia Clara Parabricks Pipelines were independently clear choices for us. Having them interoperable like this is important, and the fact that the combination further accelerates the aggregate performance is fantastic."
The Clara Parabricks Pipelines result from Nvidia's acquisition of sequencing analysis software developer Parabricks late last year. Parabricks, a University of Michigan spinout, had developed technology that leans on GPUs to accelerate the analysis of whole genomes to less than one hour.