CHICAGO (GenomeWeb) — Paradigm4, a nine-year-old bioinformatics company, has long offered SciDB, a popular database management system for life sciences developed in the laboratory of Turing Award winner Michael Stonebraker at the Massachusetts Institute of Technology.
Now, the Waltham, Massachusetts-based company that Stonebraker cofounded is branching out into multi-omics translational informatics.
Two weeks ago, Paradigm4 unveiled Reveal, a translational informatics platform that works with SciDB, a product that has powered the National Institutes of Health's 1000 Genomes Project's browser since 2013.
Reveal is meant to give bioinformaticians and other scientists a scalable platform for managing multitudes of omics, behavioral, clinical, health outcomes, and environmental data, including data from wearable devices, according to the company. It works with the SciDB computational and data-management engine, which has evolved to include machine learning.
"Reveal is an application-specific interface that's built on top of this data management and computational engine for scientific data," said Paradigm4 CEO Marilyn Matz. "One thing that's very special about it is it's got a large-scale math engine under the hood."
Reveal also takes advantage of work done by Stanford biomedical data scientist Manuel Rivas, who built a public browser called the Global Biobank Engine on SciDB to help researchers conduct genome-wide association studies using the UK Biobank database. (A paper explaining the Global Biobank Engine is in prepublication format on BioRxiv.)
"What we're trying to do is to apply genetic association analysis across the entire cohort to identify alleles that ... confer risk or protection to disease," Rivas said. "And with that we're also supporting the visualization of some of the results with the public with this platform called the Global Biobank Engine," he explained.
"With the UK Biobank data, we're analyzing it across multiple phenotypes," Rivas added.
Rivas said that he has not yet used the new Reveal platform, but his work has informed development at Paradigm4 by making the raw UK Biobank data more accessible for GWAS and related study types, including phenome-wide association studies.
Zachary Pitluk, vice president for life sciences at Paradigm4, said that Reveal's ability to consider multi-omics data as well as other types of health information will open up new research possibilities.
"The hardest disease areas right now are developing therapies for Parkinson's and Alzheimer's and the neurologic diseases," Pitluk said."Well, you're [also] going to have the benefit of brain imaging data. You have the benefit of the cardiac imaging data ... and you're going to have the benefit of the EGC data, which links to the genetic data from the patient."
Pitluk gave an example of how a user could find new knowledge by looking at two tumor suppressors, ST5 and ST7, in the Stanford Biobank engine. "You plug those in and what you'd see popping up in some of the related phenotypes are brain volumes. Who would have known? The most common loss-of-function variants in those tumor suppressors has something to do with brain volume. It's just phenomenal. It's new biology," he said.
"But in order to get to that new biology, you can't get caught up in all of the mechanics [behind the scenes]," Pitluk continued. "Reveal allows scientists to do get away from that so that they can stay at the scientific level."
Matz also highlighted the wearables component of a multimodal approach, which can help researchers and clinicians alike understand physical symptoms and disease progression.
"To develop an understanding of disease and to build models for precision medicine, you need this rich multimodal data. The genomics data isn't enough in and of itself," Matz said.
"And you need to be able to correlate … [and] cross-validate across these modalities, so understanding, for example, features that might be found in brain imaging and relating them to variants that are found in analyzing the genotype data."
Because Reveal can handle all these different types of data, Pitluk called it "future-ready." He noted that the UK Biobank is similar in that its value proposition goes far beyond the fact that it is among the largest genomic databases in the world because it also incorporates electronic health records and other phenotypic data, including imaging and wearables.
However, SciDB has been set up for a decade to manage data on the kind of scale that genomics demands, according to Pitluk.
Stonebraker, Pitluk noted, "blurred the line between storage and computation. He took the traditional field of high-performance computing, which is a batch-oriented system, and developed the technology that allows an exploratory and analytical process on a scale of data that was previously undoable."
In addition to the 1000 Genomes Project, SciDB also has been used for weather simulation at the National Aeronautics and Space Administration, with a dataset of 30 years worth of weather history, including satellite images and wind, temperature, and pressure sensors, Pitluk said.
The first customer of Reveal that Paradigm4 has identified is RNAi therapeutics firm Alnylam Pharmaceuticals. The Cambridge, Massachusetts-based biopharma company will use both Reveal and Paradigm4's PheGe genomics browser to identify patient cohorts based on medical records as well as imaging and wearables data.
Today, Paradigm4 announced that it had signed up an unspecified top-15 pharmaceutical company as a Reveal user.
Alnylam officials were unavailable to discuss the contract because, according to Matz, that company will not even be receiving UK Biobank whole-exome data until later this month.
Pitluk said that Paradigm4 could have new customers up and running on UK Biobank records within a month of signing a contract. "We already have very deep experience loading all of this UK Biobank data," he said. "There is an urgency around mining this really valuable dataset."