CHICAGO (GenomeWeb) – Shortly before he stepped down as CEO of Illumina in 2016, Jay Flatley said that his successor, Francis deSouza, would make software a major focus for the company.
Now, Illumina, famous for its grip on the next-generation sequencing instrumentation market, is trying to raise the profile of and expand the market for its suite of BaseSpace informatics products.
While the flagship BaseSpace Sequence Hub has been on the Amazon Web Services cloud since Illumina introduced the platform in 2011, the San Diego-based company is currently deploying its other informatics products through AWS. By the end of the year, Illumina should be able to have new customers up and running on BaseSpace Cohort Analyzer, BaseSpace Correlation Engine, and BaseSpace Variant Interpreter within a week, company officials said.
BaseSpace Cohort Analyzer allows users to integrate and jointly analyze subject and genomic data for clinical research, while BaseSpace Correlation Engine mines more than 20,000 genomic studies. BaseSpace Variant Interpreter, meanwhile, helps labs annotate and report variants from human genomic data.
The three products now being migrated to AWS are meant to complement and extend the utility of Sequence Hub. "Our [sequencing] instruments digitize biological information into As, Cs, Ts, and Gs. BaseSpace Sequence Hub is where we provide genomic context, and the other products in the suite — Variant Interpreter, Correlation Engine, and Cohort Analyzer — layer on additional context for genomics researchers," Kevin Meldrum, Illumina's senior director for product management, said via email.
The Correlation Engine database has 135,000 gene expression "signatures" and 10 billion data points from more than 500,000 biological samples from public datasets that Illumina curates, according to Meldrum. Each signature represents a curated piece of data, such as gene expression in a specific tissue type, in a specific drug condition, or in a specific disease model.
"This serves to summarize results so that they may be correlated to new data quickly in our web interface," Meldrum said. "By enabling customers to correlate new data with these signatures, customers can then discover new associations that are otherwise inaccessible with conventional search tools like Google and PubMed."
Illumina is also aiming to simplify the view and support more types of results by reorganizing how variou BaseSpace products display results into a single data element called "lab measurements," according to a company representative. For that, Illumina will use the Study Data Tabulation Model (SDTM) standard that the US Food and Drug Administration adopted years ago for submitting clinical trial data.
With the new user interface, set to come online in January, clinicians and researchers will be able to click on a patient and see information in a format that resembles an electronic health record. This interface also will display molecular data, including copy number variations and DNA methylation data, in separate tabs.
The migration to AWS will allow Illumina to increase the amount of curated content it supplies to customers. Currently, the curated database for Variant Interpreter and Cohort Analyzer includes molecular samples of 15,000 cancer patients, but following the move to AWS, the company plans to bring in many thousands more in early 2019.
Diagnostics laboratory Munich Leukemia Laboratory (MLL) of Germany has been using the entire suite of BaseSpace products, though it mostly uses Sequencing Hub. The lab is equipped with five Illumina NovaSeq 6000 instruments, as well as a HiSeq X Ten unit.
At the moment, all of the lab's research sequencing data is streamed into BaseSpace on MLL's private cloud, based in Frankfurt, in order to comply with Germany data privacy and security regulations. However, all patient data is hosted locally, according to MLL bioinformatics head Niroshan Nadarajah.
Using BaseSpace has allowed the lab to avoid the cost of building a local cluster, Nadarajah said, and allows it to run many analyses in parallel. MLL also uses BaseSpace for its sequencing service business to offer customers access to their data via the cloud.
MLL has also been using BaseSpace Cohort Analyzer in an effort to analyze 5,000 genomes it sequenced at the cohort level. However, that was not entirely successful because manual annotation was missing. Nadarajah said he would like to see better integration of local data with the publicly available datasets collected by Illumina going forward.
Because MLL is using a private cloud, which is hosted by AWS, not all apps within the public version of BaseSpace are currently available, Nadarajah said, but that will change once Illumina moves all parts of BaseSpace to AWS next year.
MLL currently only uses BaseSpace for research but is evaluating using it for routine clinical and diagnostic operations, he said.
Cohort Analyzer and Correlation Engine were developed by NextBio, a company Illumina acquired in 2013. The products were originally called NextBio Clinical and NextBio Research, respectively. Correlation Engine dates to 2006, while what is now known as Cohort Analyzer debuted in 2013.
One of the reasons why Illumina bought NextBio was the automation within Cohort Analyzer. Five years ago, though, it was more research-focused than clinical.
Cohort Analyzer has evolved into a platform for aggregating and interpreting large quantities of genomic data for research and clinical applications. It provides a suite of applications that help researchers identify mechanisms of disease, drug targets, and prognostic or predictive biomarkers by combing through curated and correlated public and private genomic data.
For subject-level clinical summaries, Cohort Analyzer draws on information from public sources, including the Cancer Genome Atlas, the 1000 Genomes Project, and the Gene Expression Omnibus database. Correlation Engine pulls from the same databases but produces reports at the population level.
In recent years, Illumina developed BaseSpace Variant Interpreter for managing clinical reporting. That product was officially launched in 2017 after two years in public beta testing. Illumina also has put the BaseSpace name on Clarity LIMS, a laboratory information management system that has been on the market since 2011 and was originally developed by GenoLogics, a company Illumina acquired in 2015.
Following the NextBio acquisition, Illumina created a business unit called Enterprise Informatics, which had 250 to 300 people, just a small fraction of the company's more than 7,000 employees. That unit did not fare well as a separate entity and was subsequently dissolved, according to an Illumina employee.
Most of those jobs still exist within Illumina but were shifted to other departments to reflect the fact that informatics is a component of every genomic data operation. Now, there are close to 400 people working in bioinformatics across many different projects, the employee said. Other engineers and biologists at Illumina are involved in bioinformatics there as well.
One area Illumina hopes to serve going forward is hospitals. As genomics moves into the clinical realm, hospitals are often finding themselves unsure of how to proceed, thanks in no small part to a shortage of expertise at provider organizations to harmonize clinical and genomic data, company officials said.
Hospital CIOs have been struggling with implementing data standards for decades in clinical informatics, and the introduction of genomics has only compounded the problem. Pharmaceutical companies are experiencing similar issues as they get into large-scale sequencing.
Also, many genomic projects require data to be shared between institutions. Cohort Analyzer was built to address this problem, according to the company.
Sequence Hub and related BaseSpace products automatically pull in numerous types of data, including clinical information, RNA expression, copy number variants, and somatic mutations from genome databases and clinical information systems. Illumina is building additional ingestion interfaces for methylation and protein expression and other variables, the company indicated.
After pulling in data, the BaseSpace platform standardizes the information according to various ontologies, including the Human Phenotype Ontology and SNOMED-CT, a widely adopted set of terminology for EHRs.
"In general, Illumina is interested in advancing standards and encouraging open access in the genomics space, because we understand that these activities help the entire industry build better solutions and accelerate industry growth," Meldrum said. He noted that Illumina and the American Society of Clinical Oncology have jointly contributed variant interpretations of thousands of somatic genetic alterations to the Clinical Interpretation of Variants in Cancer (CIViC) database.
Illumina is not unique in trying to harmonize varying types of data for omics interpretation, however. For example, cBioPortal for Cancer Genomics, a Memorial Sloan Kettering Cancer Center-developed web application platform that contains a series of de-identified cancer sequencing datasets, has done this. An Illumina official also referenced MetroNome, a genomic data visualization tool from the New York Genome Center that was updated in June to support RNA data, allowing researchers to see gene expression in addition to genomic variation for both individuals and patient populations.