NEW YORK (GenomeWeb News) – Seeking to make the masses of cancer sequence data that is being generated more useful for researchers, investigators at University of California, Santa Cruz, plan to use a $3.5 million grant from the National Cancer Institute to create a new platform for organizing and accessing these data.
The UCSC group plans to create a method for making the raw sequence information in repositories like the university's Cancer Genomics Hub more useful for investigators seeking to make clinical predictions about how cancer mutations respond to drugs, for example.
The aim of the project will be to develop a new database called the Biomedical Evidence Graph, or BMEG, which will use a graph database structure, like Facebook does, to enable swift access to complex and interconnected datasets.
Principal investigator Joshua Stuart, a UCSC associate professor of engineering, likened the difficulty for many investigators of using raw sequence data to average computer users trying to work directly with binary code.
"Your web browser doesn't understand zeros and ones. There are layers and layers of software programs between that and what you see on a web page. We need to do the same thing for DNA sequences to reach the higher levels of interpretation needed for scientific discovery," Stuart said in a statement.
Stuart said that a platform similar to what social networks like Facebook use offer a "natural way" to represent data from tumor samples based upon the connections between their molecular profiles.
CGHub, which launched last year to house data from The Cancer Genome Atlas consortium and similar projects, holds thousands of genome sequences from individual patients and access is highly controlled and limited to approved projects.
BMEG, however, will not require such security because it will host higher-level data from analyses of the raw genome sequencing. This will enable a broader group of investigators to use and analyze these datasets without having to download massive files to their computers.
"TCGA researchers have built a lot of great tools for data analysis, and we need to get those installed in the BMEG so the rest of the world can engage in that higher level analysis," Stuard said. "The idea is to build a shared knowledge base and create a playground where lots of researchers can interact, test their algorithms, and compare results."
The BMEG will be located with the CGHub servers at the San Diego Supercomputer Center, and investigators will be able to run their analyses as apps on the BMEG, UCSC said.