Researchers at the University of California, Santa Cruz, have developed what they're calling the world's largest database of cancer genomes, reports ScienceInsider's Jocelyn Kaiser. The new Cancer Genomics Hub, or CGHub, will hold raw data from The Cancer Genome Atlas, as well as data from the US National Cancer Institute's childhood- and HIV-associated cancer genome projects, Kaiser says. It will also take over the cancer sequencing data collected by NIH's National Center for Biotechnology Information.
"Physically based at the San Diego Supercomputer Center, the CGHub computer system is ready to store 5 petabytes of DNA and RNA data from cancer patients," Kaiser says. "TCGA is generating 10 terabytes of data a month, and will eventually produce 10 petabytes — 10,000 terabytes — of data."
UCSC bioinformatician David Haussler tells Kaiser that bringing all that data together in one place will help researchers develop more treatments for cancer patients. "What's very important is … make it easy for researchers to do cross-dataset comparisons," he adds. In a UCSC press release, Haussler also compares the project to other massive scientific undertakings of the past decade. "Right now, cancer research needs something on a very large scale, like the Large Hadron Collider in physics," he says. "Instead of bringing subatomic particles together in high-energy collisions and computing their behavior, we're bringing cancer genomes together in a common database and computing the disease drivers."