CHICAGO – A massive new RNA-focused dataset from the Michael J. Fox Foundation for Parkinson's Research promises to guide researchers to new biomarkers for progression of the neurological disorder.
Released in late August, the Parkinson's Progression Markers Initiative (PPMI) RNA Sequencing Project dataset contains 108 terabytes of longitudinal information from more than 4,750 biological samples from 1,589 individuals. Though it is smaller than multi-disease, multi-petabyte databases like the Cancer Genome Atlas, the Michael J. Fox Foundation said that PPMI RNA-seq is the largest disease-specific dataset ever created.
The deidentified cohort includes Parkinson's patients and others who meet clinical and genetic risk factors for Parkinson's disease, those with idiopathic indications, and control volunteers.
Launched in 2010 with funding from the Michael J. Fox Foundation, the PPMI is a collaboration between academia, government, and industry aimed at verifying progression markers for Parkinson's disease.
The project has consisted of numerous forms of data collection, including clinical and imaging data and whole-exome, whole-genome, and transcriptomic sequencing data from blood, plasma, serum, urine, tissue, and cerebrospinal fluid samples. Clinical information may contain medical records as well as cognitive assessments, sleep questionnaires, and depression assessments, while imaging data includes DaTscan (ioflupane I123) radiopharmaceutical tests, structural MRI, and, if available, diffusion tensor imaging.
"We have looked at a variety of other components and we distribute the remaining biosamples actively to continue to support biomarker discovery and biomarker validation," Casey said.
The PPMI RNA-seq collection will continue to get bigger as whole-genome and transcriptome sequencing continues on the original samples and on any future samples collected. "We do have releases planned for the foreseeable future as long as we keep doing sequencing," said Bradford Casey, associate director of research programs at the Michael J. Fox Foundation.
The next big release, which likely will happen in the next few weeks, will add noncoding RNA data to the collection.
"In that, you learn so much on gene regulation in Parkinson's disease. We still discover novel noncoding RNAs … and that gets us new insights into biology," Casey said.
Next month, according to Casey, PPMI will be contributing this massive RNA-seq dataset to the National Institutes of Health-led Accelerating Medicines Partnership – Parkinson's Disease platform, which also seeks to validate PD biomarkers. That public-private partnership includes the Foundation for the NIH, Google sister company Verily, and many pharmaceutical companies.
"The goal is to let people bring their ideas to the data," Casey said. "For us bioinformaticians, it's a treasure to have this dataset. I think it would keep my group and many others busy for years to understand what is in this data," he added.
As a longitudinal dataset, PPMI takes multiple samples on each subject, as many as five times in a three-year period. "We went really deep and broad, 200 million reads per [sample], which in some cases puts you at 1 billion reads just in RNA-seq," said David Craig, codirector of the Institute of Translational Genomics at the Keck School of Medicine of the University of Southern California. Craig, who focuses on genomics, bioinformatics, and neurogenomics, is coleader of the RNA-seq part of PPMI with Kendall Van Keuren-Jensen of the Translational Genomics Research Institute in Phoenix.
"You tie that to the whole-genome sequencing, you add in hundreds of clinical variables, you add in imaging, and this is really an incredible set," Craig said.
Through a portal, researchers can register to download data for free, giving them access to the raw sequencing data, customizable tables from the full dataset, and a visualization tool to help them follow changes in gene expression for each transcript in the human genome.
Casey said that data sharing and open access are core principles of the RNA-seq project and of PPMI as a whole.
"Not only are we sharing the data at the end of the project, we're actually sharing the raw data essentially at the midpoint of the project. Virtually as soon as it comes off the sequencers and goes through quality control, we've made it accessible as much as possible to any PPMI researcher," Casey said.
Andreas Keller, director of the Center for Bioinformatics at Saarland University in Germany and a visiting professor of neurology and neurobiology at Stanford University this year, is one of those researchers.
Keller is currently testing whether there is an acceleration in the change of genes and noncoding RNA over time as Parkinson's progresses. From the PPMI dataset, Keller said that he and his colleagues on both sides of the Atlantic have noticed signs of exactly that.
"We see that the dysregulation gets stronger with the progression of disease, so this is something that gives us a really big hope" for the research leading to more effective treatments, he said.
Keller is also testing the hypothesis that PD might be more age-dependent than previously assumed, and that the biomarkers for early-onset PD are different from those indicating a later onset of the disease.
"This might change our thinking about a biomarker as a static thing," he said.
Keller added that his lab currently is attempting to validate early results, showing that gene expression differs in hereditary and nonhereditary forms of Parkinson's.
Keller expects computational biologists to get "excited" about this genomic and transcriptomic dataset. He noted that the raw data is also available to computational biologists who want to build their own processing pipelines.
"We see this as our flagship biomarkers observational study," added Casey. "Again, our goal is to empower the field in any way we can because a clinically deployable biomarker remains a pressing, unmet need."