Compendia Bioscience announced this week that it will use a $1.3 million Small Business Innovation Research fast-track grant from the National Institutes of Health to incorporate microRNA data, which have been linked to cancer development and progression, into the Oncomine platform, its cancer gene expression database.
The company plans to develop bioinformatics pipelines that will gather information on miRNA datasets from the public domain; process, normalize, and analyze the data; and store the information in a format that can accessed by pharmaceutical and biotechnology companies involved in cancer drug research.
In the grant abstract, company researchers write that they plan to "collect all publicly available cancer-related high-throughput miRNA data" and standardize it at three levels: sample data, expression data, and statistical analyses.
The researchers also wrote that they plan to "present the data in a consistent, comparable format that is also fully integrated with existing mRNA and DNA copy data in Oncomine."
In spite of increasing amounts of genome-scale miRNA expression profiling data and growing evidence of the important role miRNA has in cancer, Dan Rhodes, Compendia's chief executive officer, told BioInform that it is still "nearly impossible" for cancer researchers to " query the expression of miRNAs across published studies" because there isn’t a "complete set" of annotated miRNAs among other challenges
Part of the problem is that miRNA data is generated using several different technologies such as Luminex Beads and Affymetrix GeneChips and presented in several different formats. As such, one of the challenges for Compendia's developers, according to Rhodes, will be to find ways to integrate the data from these different platforms.
"If an investigator has a microRNA of interest and they want to ask the question 'is my microRNA ever significantly up- or down-regulated in the cancers,' we want to be able to ask [the question] across all of the existing data no matter which platform it was run on," he explained.
In addition, cancer researchers also have to take into account gene expression data, copy number variations, and mutation data among other types of data in order to have a well rounded understanding of the mechanism of the disease and to develop appropriate therapies and treatments.
To amass cancer-related data in a single location, Compendia developed its flagship product, Oncomine, which is a database that currently contains nearly 50,000 cancer genomic profiles such as copy number variations and gene expression data. Cancer researchers can use the data to locate and validate drug targets, as well as identify biomarkers for cancer.
In 2008, Compendia released a tool called Meta-COPA, built on an approach the company developed called Cancer Outlier Profile Analysis, or COPA, which identifies genes with high expression levels in a subset of cancer cases and uses the information to predict candidate oncogenes from gene expression data. The tool was used to identify genes expressed in a subset of prostate cancers in a University of Michigan-led study. (BI 06/26/2008)
To gather and integrate miRNA data in the Oncomine platform, Rhodes said that the development team plans to apply some of the same approaches used in meta-COPA.
In addition to data integration, Rhodes said that when the tool is released, researchers will be able to perform differential expression analysis, clustering analysis to locate sets of miRNAs that are consistently co-expressed, as well to look for miRNA patterns that occur across different subsets of cancer among other types of analyses.
He pointed out however that the key feature of the miRNA pipeline is that researchers will be able to integrate miRNA expression data with mRNA expression data. This is important because miRNA's regulate mRNAs, Rhodes said, and understanding the interactions between the two types of RNA could provide insights into how miRNA expression affects gene expression in cancer.
The two-year project will be divided into two phases. In the first phase, Compendia plans to establish strategies to curate sample metadata, standardize information from different platforms into a single format, and perform differential expression analysis for three miRNA datasets.
During the second phase, the company plans to develop scalable software to capture and curate miRNA data as well as methods to analyze miRNA profiling datasets and integrate the data into the database.
Compendia plans to begin the project with a proof-of-concept study in which it will analyze several miRNA datasets using the same tools used to analyze mRNA data.
Rhodes said that the tool's primary users would be large oncology-focused biotechnology and pharmaceutical companies who are interested in "querying and analyzing" miRNA data from cancer patients.
He added that Compendia plans to stay focused on the applications of genomic data in oncology research and to develop tools and systems that run the "most relevant analysis" for cancer drug discovery and development, and noted that this will enable them "to stay well ahead of the curve" and have an impact on cancer drug discovery and development.
Compendia hopes to release the first version of the pipeline by the end of 2011.
Rhodes also said that the company plans to build a suite of tools that can analyze sequence variance, digital gene expression, and gene fusions in cancer from next generation sequence data.
Based in Ann Arbor, Michigan, Compendia currently has 28 employees and is looking to hire a senior scientist and a systems administrator. The company plans to hire additional data curators and bioinformaticians next year.