CHICAGO – Database management giant Oracle is making a bid to compete with heavy hitters in the genomic analysis space after recently announcing that it has teamed with the University of Oxford to create a technology system to accelerate the identification of SARS-CoV-2 variants.
Called the Global Pathogen Analysis System, or GPAS, the Oxford technology combines the UK university's Scalable Pathogen Pipeline Platform, or SP3, with the Oracle Cloud Infrastructure to standardize, analyze, compare, and annotate SARS-CoV-2 sequencing data in search of novel variants that could undermine vaccine efficacy and prolong the COVID-19 pandemic.
The partnership builds on earlier work, funded by the Wellcome Trust, that involved Public Health Wales, the University of Cardiff, and Public Health England, and finds a new use for SP3, which had been developed to track tuberculosis outbreaks. The addition of the Oracle Cloud enhances SP3 processing power, security, and global collaboration, according to the Austin, Texas-based firm.
About three months ago, Public Health England — which is changing its name to the UK Health Security Agency as a direct result of the COVID-19 pandemic — asked Oxford to help expand the agency's scale of SARS-CoV-2 genetic testing, according to Derrick Crook, professor of microbiology in Oxford's Nuffield Department of Medicine and leader of the GPAS initiative.
"[Public Health England] wished to take advantage of these cloud services that we were able to access through this platform as a means for achieving that scale, and therefore improve the throughput that they needed to meet the national needs," Crook said. "We were quite happy that our service … could be repurposed for COVID."
The SP3/GPAS platform can be adapted for the genome of any infectious disease, according to Crook.
In announcing the Oxford partnership, Larry Ellison, Oracle's chairman and chief technology officer, stated a desire for the repurposed SP3 to set a "global standard for pathogen data gathering and analysis" to improve understanding of not only the virus that causes COVID-19 but other microbes that threaten public health.
"It's focused on exploiting whole-genome sequencing," Crook said. "We aim to translate genome sequencing as a diagnostic tool, even maybe a replacement for a lot of the manual processes" that go back to the early 20th century.
Crook and colleagues around the world are looking for ways to replace "traditional" microbiology with genome sequencing of microorganisms including bacteria, viruses, and fungi for diagnostic purposes. His group at Oxford has not worked directly with parasites, but other SP3 participants have.
"Over the last 10 to 15 years, we've developed an understanding of the informatics tools that you need to have to process the data," Crook said. "As important … is how you analyze the data, interpret it, and recover all the characteristics that you wish to exploit for diagnostic purposes."
Expertise developed over that time has helped improve genome sequencing analysis for detecting microorganisms.
Oracle is hardly a household name in supporting big data in life sciences, and the Oracle Cloud Infrastructure, or OCI, is rarely mentioned among the major cloud service providers, which usually include the "big three" of Amazon Web Services, Google Cloud Platform, and Microsoft Azure. Indeed, Crook said that he was unaware that Oracle was deeply involved in bioinformatics until the firm reached out to him several months ago.
Crook said that Oracle approached him when the company decided to make a major philanthropic contribution to the COVID-19 response effort and to the future of infectious diseases. Oracle donated 10 years' worth of cloud access with no limit on usage for GPAS, according to Crook.
"They want Oxford to be the founding enterprise that would then create the environments that people would wish to use the services across the globe with infectious diseases, starting with COVID," Crook said.
GPAS will be available for free to researchers and nonprofits worldwide. Oxford and Oracle said that they eventually plan on extending the platform to all pathogens.
Mike Sicilia, group vice president for Oracle who is the company's point person on COVID-19 response as well as the Oxford partnership, noted that Oxford is not the firm's first major partner in high-performance computing for genomic and molecular diagnostics applications but said that the deal certainly raises the firm's profile in bioinformatics, including for processing of genomic sequences. "I would expect that becomes a much bigger focus area for us as we go forward," he said.
Oracle announced last week that its cloud platform has been supporting a precision oncology analytics system developed by Australian startup GMDx Genomics.
GMDx uses the OCI platform to host its technology, which measures the "innate immune fitness" of cancer patients, based on an analysis of 40,000 metrics pulled from whole-genome sequences. This analysis then guides clinicians in selection of targeted therapies.
Oracle has an imaging partnership with the University of Bristol in the UK that helped produce a paper in Science last fall detailing work showing that the SARS-CoV-2 spike protein binds with linoleic acid. A partnership with the Wake Forest Institute for Regenerative Medicine is supporting the 3D printing of organoids to test the efficacy of drugs on conditions including COVID-19, cancer, and heart disease.
The tech company also collaborates with the Ellison Institute for Transformative Medicine of the University of Southern California, a major philanthropic effort of Oracle's founder that is not directly related to the company. Among its many endeavors, the Ellison Institute uses the Oracle cloud to integrate genomic data with digital imaging, then applies artificial intelligence to cancer diagnosis and treatment.
AI and predictive analytics require a lot of high-performance computing power, particularly when working with large datasets in the cloud. As with any high-performance cloud platform, OCI provides the flexibility for customers to scale their usage up or down as needs change.
Much of the technology to support sequencing in academic research environments historically "has been home-grown and it's really been kind of cobbled together and assembled by the research teams themselves," Sicilia said. "Obviously, they need some heavy-duty horsepower to be able to run these [experiments]."
And institutions don't want to have to support high-performance computing centers, especially if they expect to have a good amount of downtime between sequences or they want to allow researchers to collaborate with colleagues elsewhere. Sicilia noted that cross-institution collaboration can be difficult for home-grown systems behind a university's firewall, and Oracle would like to support that in genomics.
Prior to moving to the Oracle cloud, SP3 partners in places including the UK, US, India, and China, have relied on a variety of different computing infrastructure at universities and research institutes around the world, from local servers and server clusters.
"The scale of the compute you have available is [only] as big as your data center and it has its origins in a research data center," Crook said. "It's big, but compared to the large commercial cloud suppliers, it's small and therefore it's limited in the extent to which it can scale."
Plus, locally hosted data centers require humans to run them.
"The economies of scale that come from having a large commercial supplier means you can do really quite extraordinary things," Crook said. "It is a completely different scenario."
Oracle has created a high-performance database computing platform on its cloud called the Oracle Exadata Database Machine. Sicilia said that the tech giant has been able to combine its scalable computing infrastructure with intellectual property from its life sciences and healthcare business, as well as some other industry-specific segments.
"We have the right set of horizontal IP in high-performance compute, [namely the] Exadata cloud service, the Oracle cloud in general, and the right set of vertical experts inside our businesses who are clinically focused," Sicilia said.
He said that the company supports clinical trials in the cloud for "most of the major pharmas in the world." These customers tend to use the Oracle Cloud Infrastructure for clinical measurement and drug discovery randomization.
Sicilia said that it is an "added direction" for the Oracle cloud to support sequencing analysis. "While we were focused on COVID right now for obvious reasons, we believe that there's great utility in this becoming a general-purpose sequencing cloud as well," he said.
He said that Oracle is focused on making its cloud a "turnkey service" including operations, maintenance, and security for life sciences companies.
Oracle now is working with Oxford to deploy SP3 in the cloud so the platform can process sequencing data from large numbers of SARS-CoV-2 samples.
"The thinking here is that you would be able to turn around results very rapidly once you had a sequence," ideally within an hour, Crook said. He noted that much of the delay in processing SARS-CoV-2 samples has been due not to computing speed or availability but in sample transportation and preparation for sequencing.
"We want to compress the post-sequencing analytical time down to well under one hour. We want to do that for tens of thousands of samples of [SARS-CoV-2] if there's a need," Crook said.
Crook said that his group is working with Zamin Iqbal, a computational genomics researcher at the European Molecular Biology Laboratory-European Bioinformatics Institute. Iqbal has refined a tool to complete the initial processing of a SARS-CoV-2 genome in less than 10 minutes so researchers can at least identify the lineage of the genome, according to Crook.
Other bioinformatics tools available to the project are for annotation and for phylogenetics. Crook noted that building a phylogenetic tree is "computationally very, very demanding, and it reaches its limits of design when you get to hundreds of thousands of genomes."
Part of the migration includes predefining standards for interpretation of bioinformatic results so users are confident that, for example, the right variants are being called, according to Crook. Just testing the quality of those processes is a major effort, he added.
GPAS is targeted at public health agencies for virus and variant surveillance, for those studying longer-term immunity to COVID-19, and companies working on the next generation of vaccines, Crook said.
"Understanding the details, the granular features of genomes, and mining that information has some very practical benefits for particularly the coronavirus at the moment," he said.
Crook noted that variants not only change transmissibility and possibly vaccine design but also could potentially affect testing by necessitating a change to the PCR primer.
"All these changes are a nuisance, and it's very helpful to track, recognize them, and react to them," Crook said.
Future directions of the GPAS project will depend on global needs, though Crook said that one probable application will be in tracking influenza A mutations because flu vaccines are notoriously ineffective some years.
"It also helps you spot viruses that become resistant to antiviral drugs," he added.
He said that genomic labs have gotten good at tracking some genetic changes that confer resistance to anti-tuberculosis drugs, but it is difficult to find all such mutations. "The holy grail is that you will be able to do this with the whole-genome sequences at scale one day," Crook said. "We hope to contribute to that."