CHICAGO (GenomeWeb) – The Kids First Data Resource Portal, launched in September by the Gabriella Miller Pediatric Data Resource Center, debuted with one of the largest data releases of its kind.
The initial release included some 8,000 DNA and RNA samples from children and families affected by pediatric cancers and structural birth defects, taken from the Children's Brain Tumor Tissue Consortium (CBTTC), led by the Children's Hospital of Philadelphia. The database has since grown with the October addition of 299 patients with adolescent idiopathic scoliosis.
Organizers of the Kids First Data Resource Portal expect the collection to include 33,000 new pediatric whole-genome sequences by 2022.
Notably, the Gabriella Miller Pediatric Data Resource Center released the data without embargo. "I don't think the value of that can be overstated or overstressed," said Phillip "Jay" Storm, chief of neurosurgery at CHOP and codirector of the hospital's Center for Data Driven Discovery in Biomedicine, which hosts the Data Resource Center.
Storm noted that many of the families he serves are dealing with the prospect of a dying child. An embargo of six to nine months might be too late for a lot of pediatric cancer patients.
"These families, they enroll in these studies and they allow these tissues to be collected for research and the data generated, but it's very hard for them to then get the data or get the sample and get it sent somewhere else," he said. Embargoes of months or years mean that their children may have undergone an inappropriate therapy or even died.
"There's a sense of frustration and hopelessness that there's nothing they can do because they can't get access to the data or it's tied up or you can't move it. Even when you can't maybe make a difference in your child, [you want] the ability to at least feel like you're being proactive and helping move their data and share it in a very short order of time," Storm said.
"It's a huge thing that our patients and these [pediatric cancer] foundations want. Fortunately, we have enough people in the CBTTC that believed in that and have allowed that to move forward," he continued.
Kids First Data Resource Principal Investigator Adam Resnick, the other codirector of the Center for Data Driven Discovery, similarly noted that "this is one of the first times that pediatric genomic data is available in such a way that researchers can have immediate access to the data once they are approved and immediately use that data for computation and analysis.".
"The hope is that, by providing access in a research environment such as this, we'll accelerate discovery. The whole mandate here is to bring the brainpower of the community across all diseases," Resnick said.
He called this rapid access to data "an entirely new way of doing research."
The portal includes files representing pathology reports, surgical reports, imaging, and pathology slides. "It's really looking at a comprehensive digital footprint of a patient at a specific point in time, and then also longitudinally," such as cancer progression, Resnick said.
"For many of us in the field, whether it's clinical or on the research side, there was clear recognition that in the pediatric cancer and rare-disease environment, more tools and resources were necessary to interconnect and harness the power of big data in order to change the course of disease therapies and outcomes for patients," Resnick added.
Resnick noted that pediatric cancer researchers have long struggled to create cohorts of rare-disease patients of sufficient size to make a difference in therapy development. "The requirement that we have from the biology perspective is the larger the data set, the more penetration we can get into the biology of that disease," he said.
The Gabriella Miller Pediatric Data Resource Center is an initiative of the Gabriella Miller Kids First Pediatric Research Program, a National Institutes of Health Common Fund-backed effort to advance pediatric cancer research through the creation of a genomic and phenotypic data resource for the medical community. The Kids First Pediatric Research Program started in 2014 with a 10-year funding commitment of $126 million to support large-scale data generation across a wide landscape of diseases.
The Data Resource Center launched in August 2017 to support data-driven research into pediatric cancers and structural birth defects. "This first version of the portal seeks to empower the identification by researchers, clinicians, and even patients of what data are available and how those datasets can be used in an environment that supports the computation around those datasets," Resnick said.
Resnick said that this is among the first places investigators and clinicians can go to identify cohorts of datasets based on a specific rare pediatric disease or diagnosis, including whole-genome sequences, whole-exome sequences, RNAseq data, imaging data, pathology data, and imaging. Outside users access the portal and manage their data in a workspace called Cavatica, a cloud platform jointly developed by CHOP and Seven Bridges to gather and share genomic, clinical, and other useful biomedical data for pediatric cancer and rare disease research.
Since the portal launch four months ago, the partners have added a KidsFirst harmonization pipeline as a public app available through Cavatica. This allows registered users to save queries, view the interests of other users, and see lists of diagnoses from other queries, according to a CHOP spokesperson.
The initial harmonization efforts are focusing on clinical and phenotypic data to allow for better identification of relevant cohorts for researchers, as well as the alignment of these records with genomic data, Resnick said. He said there is no consensus within the bioinformatics community about the right workflow or analytics pipeline for interpretation of structural variations or other alterations in the context of whole-genome sequencing, so Kids First will continue to run pilots in search of such consensus on large-scale genomic datasets.
Resnick called the portal launch the first step in a long journey.
"By empowering shared access across such datasets with a common platform for analysis, common workflows, and where the datasets are actually harmonized by the Data Resource Center team, it allows investigators immediacy of actually asking questions, empowered by the bioinformatic workflows," he said. "That's really the first step of the Kids First Data Resource Center."
The portal and the Data Resource Center are meant to be points of entry for patients, clinicians, and bioinformaticians alike. Resnick said that patients and their parents can register for the portal to learn about rare diseases and relevant studies, as well as connect with researchers. Terms of service for the Children's Brain Tumor Tissue Consortium — and for all NIH-funded projects — require users to share findings.
Per NIH guidelines, patients consent for the use of their data as part of Kids First. Resnick said that this makes patients and their families partners in research. "They want to empower discovery and they want the broad use of data in ways that researchers can really make an impact," he said.
For this reason, patient and family access will be the focus of the second phase of portal development, according to Angela Waanders, director of clinical research of the Center for Data Driven Discovery in Biomedicine and executive chair of the CBTTC.
"When we're talking about data, we're not just talking about the genetic mutations described in a given patient disorder, but we're talking about complex molecular data combined with phenotypic data and being able to compare datasets together," Waanders said. She said that this would be helpful for clinicians as well as researchers.
"For Dr. Storm and me in the clinic, we often have a sense that there are shared things across patient populations. For example, I may have a patient with a brain tumor and a cardiac heart defect. It seems like the two have to be related somehow in a patient so young," she said.
Pediatric cancers are different from adult cancers in that they are almost always developmental disorders. "They're not traditional malignant cancers like you think of in an adult with lung cancer who has smoked for 20 years," Waanders explained.
Waanders noted that histones are particularly significant in studying pediatric brain tumors. "They actually help inform how DNA is processed and how genes are regulated, but it's not a direct cause and effect," Waanders said.
"This kind of unparalleled release of this type of dataset in a portal or in a platform such as Kids First will allow [participation by] outside scientists who may have no interest in brain tumor research, but may have an interest in a specific gene mutation or a specific developmental area or a specific patient cohort," Waanders added.
She said that this feature of the Kids First portal will rive hypothesis generation while providing scientists with secure access to relevant data. "This is all done in a regulatory-compliant way, with controlled access, but it's open access in that any investigator in any area of research may find this useful," Waanders said.
"The platform allows people to reproduce results through common pipelines that are available for the researchers. Our researchers can share pipelines. People can bring in their own datasets and intersect them with Kids First datasets for their own research," Resnick added.