By Vivien Marx
This article was originally posted on April 20.
NextBio this week launched a new "sequence-centric" module as part of its search and data-mining platform that enables users to integrate and mine high-throughput array and second-generation sequencing data.
The company said that the new module will allow scientists to use the software-as-a-service offering to explore their own data within the context of publicly available datasets.
Saeid Akhtari, NextBio's president and chief executive officer, told BioInform via e-mail this week that the new module is the company's response to the growing number of studies leveraging second-generation sequencing technologies. Accompanying these studies is a "growing demand for tools" to help scientists and clinicians mine both public and private data in real-time, he said.
Akhtari said that the sequence-centric module is targeted at scientists who have already run secondary and tertiary analysis on their second-gen sequence data. At this point, he said, the focus turns to "biological interpretation" through a variety of analysis and visualization techniques, rather than on initial data processing.
Researchers can import lists of genetic mutations, copy number variants, epigenetic regions, or RNA-seq expression profiles, such as transcripts, exons, and splice junctions associated with a particular biological condition.
The new platform offers four components: importing, mapping and annotating user data; comparative data analysis; a genome browser; and "processed public data."
The data mapping and annotation takes place "automatically during data import," Akhtari said. The NextBio platform maps any type of genomic entity, such as a list of new mutations, to corresponding genes, transcripts, public SNP identifiers, and any other genomic variants discovered in previous experiments.
"The comparative analysis component contains a suite of different algorithms — ranking, enrichment analysis, and meta-analysis," he said, which enables scientists to evaluate those mutations across multiple samples and compare data with public datasets from resequencing, gene expression, copy-number, epigenetic, or genome-wide association studies.
This component of the module is set up to help researchers "rank and prioritize SNP, gene, or pathway biomarkers for a given condition," he said.
The NextBio Genome Browser integrates a user's private data with public data and enables its visual exploration, Akhtari said. The browser is "biologist-friendly," he said, adding that it offers an interface through which researchers can visually align and compare their own data to gene structure, miRNA-binding sites, and regions of epigenetic control and linkage disequilibrium, among other features.
Preprocessing for Real Time
The NextBio team has preprocessed and curated public datasets to allow researchers to compare their data with information from the Cancer Genome Atlas, the Encyclopedia of DNA Elements, GlaxoSmithKline's cancer cell line genomic-profiling data, and other publicly available resources.
"We precompute relationships and correlations between different genomic entities, datasets, and associated phenotypes," he said. To allow users to "do work in real time, we have to 'anticipate' and pre-compute a very large number of queries," he said. These computations are centered on the meta-analysis of thousands of large-scale datasets and their associated phenotypes.
Using the example of TCGA data, he said that customers "can combine different subsets of this data — mutation, CNV, gene expression, and methylation results — to identify important gene and pathways associated with a particular type of cancer."
In the case of a customer's own data, NextBio applies enrichment analysis to pre-compute its correlation with all other datasets and associated phenotypes in the system.
"Once your data is imported into NextBio, you can quickly scan your set of mutations, for example, against all other experiments done on the same or related phenotype to prioritize important SNPs and genes," Akhtari explained. "If our users had to perform all these analyses on the fly, queries could take days."
Akhtari said that the privately held firm had a "great year" in 2009. NextBio last fall raised $8 million in a series C financing round, which helped "expedite the R&D" of new applications such as the sequence-centric module (BioInform 9/18/2009).
In addition, he said the funds allowed the firm to accelerate its curation and processing of additional data types, including ChIP-seq, ChIP-chip, microRNA, DNA methylation, and copy number variation data.
Akhtari said that NextBio was cash-flow-positive in the fourth quarter of 2009, "and we expect to be cash flow positive in 2010." There are no plans at the moment to do another round of financing, a position the firm might reconsider if it chooses to accelerate its R&D efforts.
The company is currently developing additional modules that it plans to release before the end of the year, including a new application in the area of translational medicine.