The University of Chicago has launched a cloud-based platform that provides researchers with access to analysis tools and genomic data from The Cancer Genome Atlas.
The so-called Bionimbus Protected Data Cloud is based on Bionimbus, an open source cloud-based system for managing, analyzing, and sharing genomic data that was developed by the University of Chicago’s Institute for Genomics and Systems Biology.
Bionimbus is part of the Open Science Data Cloud, a National Science Foundation-supported project that is managed by the not-for-profit Open Cloud Consortium and provides cloud services for the scientific community. Bionimbus supports projects such as modENCODE, ENCODE, and the T2D-Genes consortia.
"Our hope is that the Bionimbus environment will help democratize access to cancer genomics data so that more researchers can fruitfully work with large datasets to understand genomic variations that seem to be one of the keys to the precise diagnosis and treatment of cancer," Robert Grossman, the principal investigator of the Bionimbus project and professor of medicine at U of Chicago Medicine, said in a statement.
The Protected Data Cloud will contain data from the open access tier of the TCGA – public data that is not unique to any individual – and data from the controlled access tier – which contains data that may be unique to an individual.
In order to access data in the controlled tier, researchers will need authorization from the National Institutes of Health. They’ll need to complete a data access request, which is available through the database of genotypes and phenotypes, dbGAP. Once the request is approved, the researchers must agree to restrict their use of the information to biomedical research purposes only, and they and their institutions must agree with statements within the TCGA Data Use Certification.
According to its developers, the system is intended to provide a secure compliant computing environment capable of managing and analyzing terabytes of data so that researchers don’t have to install costly and cumbersome infrastructure locally in order to download, manage, and analyze TCGA data.
Grossman told BioInform that his team beefed up Bionimbus’ security so that the platform complies with requirements of the Federal Information Security Management Act of 2002. This means that the cloud can be used to store the electronic medical records and other protected health information collected from the TCGA individuals.
He said that that the Bionimbus cloud currently has 500 terabytes of data from breast, ovarian, and prostate cancers as well as RNA-sequencing data. He said that they plan to purchase additional hardware so that they can upload the remaining TCGA data over the next few months.
They are also installing open source informatics pipelines on the cloud to analyze RNA- and ChIP-seq data, whole exome and whole genome data, and fusion finders, as well as tools for consensus genotyping and more, he said.
Researchers can also upload their own data to analyze it in the context of the TCGA data.
“The idea is that all the analysis can easily be done there without moving the data,” Grossman said. “We are trying to simplify access for researchers. Right now, if you are not at a big center, it’s quite challenging.”