Update: this article has been updated to correct previously reported comments on credit allocation and use for the Seven Bridges NCI cloud platform and Cavatica, and to clarify how data will be accessed in both systems.
NEW YORK (GenomeWeb) – The Children's Hospital of Philadephia is partnering with Seven Bridges Genomics to develop a shared environment for securely storing, sharing, and analyzing large volumes of genomic data from pediatric patients.
The so-called Cavatica platform is one of several initiatives undertaken by the newly-minted Center for Data Driven Discovery in Biomedicine (D3b), a joint project of the CHOP Research Institute and CHOP's Department of Biomedical Health and Informatics to gather and share genomic, clinical, and other useful biomedical data for pediatric cancer and rare disease research. The platform will also support CHOP's commitment to the White House Precision Medicine Initiative, according to the partners.
The planned system is built with the same vision of data sharing and technology that underlies the National Cancer Institute's Cancer Genomics Cloud pilot projects. Seven Bridges' proposal was one of three selected by the NCI for that initiative. The company recently opened up its platform for testing as part of a nine-month evaluation phase for all three projects that began in January.
Seven Bridges' selection and involvement with the NCI cloud pilots was one of the reasons CHOP tapped the company to put together a platform for Cavatica, according to Adam Resnick, an expert in brain tumors and one of the CHOP center's founding directors. "A real motivation was to begin intersecting across the adult and pediatric datasets, something that has not been fully empowered because of the way that pediatric cancer data has been generated in the past and because of disease-centric approaches ...that have been employed to date," he told GenomeWeb. "The opportunity to intersect across [The Cancer Genome Atlas] with pediatric data was something that we thought was very valuable, and already had evidence that was very worthwhile."
Shared data and compute resources are crucial for enabling research in the pediatric oncology space, where siloed datasets and limited infrastructure have in the past hampered efforts to some degree. "In the pediatric and rare disease space, the underlying infrastructure for data is oftentimes very different than in an NIH-sponsored initiative where you have already co-located and curated datasets," Resnick told GenomeWeb. "Because much of the pediatric data is not generated under the auspices of the NIH but rather through individual institutions or investigators, the element of curation and data access and empowerment and annotation is very sparse and poorly organized."
Consortia such as the Childhood Brain Tumor Tissue Consortium (CBTTC), a multi-institutional research alliance that studies childhood brain tumors, emerged to solve many of the aforementioned issues. "In part, that is why I think our participation in the PMI summit [is] so rewarding," he said. "We actually see the pediatric community as uniquely positioned to inform this next wave of data empowerment on behalf of patients."
Resnick told GenomeWeb that CHOP approached all three cancer cloud pilot developers — Seven Bridges, the Broad Institute, and the Institute for Systems Biology — who were all willing to collaborate with CHOP on the pediatric cloud. However, "Seven Bridges really took the lead and embraced us as a co-development partner of the cloud environment that would not only leverage what [they were] doing for the NIH but [also] empower the pediatric space and its infrastructure in a way that is necessary," he said.
Cavatica — which is named for the spider Charlotte A. Cavatica in the children's book Charlotte's Web — will be a separate system but have common infrastructure and cloud could possibly connect to the broader Seven Bridges cancer cloud in future. Both systems run on Amazon Web Services. Approved pediatric cancer researchers will have access to pediatric cancer datasets collected by CHOP and partners through Cavatica. Also, all of the pipelines that are available in the TCGA cloud will be available in Cavatica as well.
Existing tools in the Seven Bridges cloud include a case explorer, which lets users identify research cases based on genomic, expression, and copy number variation data, and software for querying clinical and biospecimen metadata properties. CHOP researchers will also implement some of their own pipelines on Cavatica, including tools for visualizing data, accessing biospecimens, accessing clinical and phenotype annotations, and more. In addition, they'll work with Seven Bridges on mechanisms for integrating metadata, such as phenotype and platform-specific data with analysis workflows, Resnick said.
Cavatica's initial focus will be on pediatric brain tumors, one of the leading causes of cancer-related deaths in children, but it will eventually cover all pediatric cancers and rare diseases, Resnick noted. The system will initially host datasets collected by the CBTTC and by the Pacific Pediatric Neuro-oncology Consortium (PNOC), a network of 15 children's hospitals that run clinical trials to test new therapies for children with brain tumors. Datasets from patients participating in the consortia — including the raw sequence files — will be open to all Cavatica users with the appropropriate approvals as long as those patients have given consent for their data to be accessed at that level. "For the consortium, we have no mandate on restricting access to that data from any qualified individual," Resnick said.
In addition, Cavatica's developers have begun reaching out to researchers in the broader pediatric community who have internally generated datasets that they might be willing to store in Cavatica and share with others. "We are bringing in published datasets from individuals who have already deposited and made the data available [in resources like dbGAP] but are now essentially giving permission for the data to be stored within Cavatica," Resnick said. Also, any Cavatica user will be able to upload his or her datasets to the system, annotate them, and share them with other investigators. Users who contribute their internally generated datasets will retain the right to give approval for others to access their data but the hope is that they will be willing to share. There will be incentives to encourage them to do so, Resnick said, such as computational credits.
As with the TCGA cancer cloud, Cavatica users will be offered credits to run computations on the cloud. Seven Bridges will supply some of these, and the rest will be provided using funds from various foundations. Researchers can potentially obtain about $175 to $200 worth of computational credits to spend on Cavatica, Resnick said.
Researchers could have credits on the two systems and could perform computation on both. They could even possibly choose where to run their computations if approved data is available on both, Resnick said. The partners also plan to offer an incentive scheme, where users can obtain more credits in exchange for making their data available in Cavatica. The exact details of the scheme are still being worked out, but "there'll be opportunities for grants for users to deposit data and compute or integrate across the data," Resnick said.
Cavatica's developers are also working on mechanisms for allowing patients access and engage with their data once it's been uploaded to the system. "We don't have the complete infrastructure figured out but we've been talking to other PMI participants, including the Genetic Alliance, for example who have worked very hard to create portals that permit patients to essentially deposit or give access to their data [via] clear and easy interface[s]," Resnick said.
The partners plan to do a soft launch of Cavatica in the next few weeks. James Sietstra, Seven Bridges' president, told GenomeWeb that the partners are still working out the details of the soft launch, including the exact date. The full system will launch in the second quarter of this year.