NEW YORK (GenomeWeb News) – A team of investigators at Rice University, Baylor College of Medicine, and the University of Texas at Austin will use a $1.3 million joint grant from the National Science Foundation and the National Institutes of Health to develop statistical tools for analyzing vast amounts of molecular cancer data.
The researchers plan to develop new techniques for sorting, analyzing, and making connections between bits of data gathered via high-throughput omics technologies like genome sequencing, RNA sequencing, microarrays, and others, Rice said on Wednesday.
The goal is to address a central challenge to personalized cancer medicine, how to handle and find meaningful connections amid a sea of data and a range of data types, such as microarray data, count data from RNA sequencing, and binary or categorical data comprised of SNPs and CNVs.
"The motivation for this is all of these new high-throughput technologies that allow clinicians to produce tons of molecular data about cancer," Genevera Allen, principal investigator on the grant and an assistant professor at both Rice and BCM, said in a statement.
She noted that when researchers scan or sequence a tumor from a patient they can measure "nearly every possible aspect of the tumor," which can lead to "measurements on millions of variables."
The glut of these various types of data can cause two main problems, Allen said. First, researchers or clinicians trying to analyze and compare these data types run into "apples-to-oranges problems," she said. "Second, for scientists to leverage all of these data and better understand the molecular basis of cancer, these varied omics data sets need to be combined into a single multivariate statistical model."
Allen and her partners, BCM Assistant Professor Zhandong Liu and UT-Austin Assistant Professor Pradeep Ravikumar, plan to address these problems by creating a mathematical framework that will enable them to find conditional dependence relationships between any two variables. Such a tool should make it possible to analyze and integrate multiple sets of high-dimensional data that were measured from one group of subjects.
Being able to decipher what conditional dependences are at play in a cluster of data could save cancer researchers time and trouble later on by enabling them to rule out certain relationships between genes, and other factors, for example.
Allen and Liu began developing their techniques last year after receiving a seed grant from Rice's Ken Kennedy Institute for Information Technology. They have already had some success using the techniques, and have produced a network model for half a million biomarkers related to glioblastoma that researchers may be able to use to find out which relationships between bits of data are most important.
Allen also said the mathematical models they are working with should be useful for big data applications beyond genomics and cancer, such as national security or retail marketing.