NEW YORK (GenomeWeb) – Investigators involved in the WCM-NYGC Weill Cornell Medicine-New York Genome Center (WCM-NYGC) for Functional and Clinical Interpretation of Tumor Profiles collaboration recently received just shy of $490,000 from the National Cancer Institute to help further data analysis for the Cancer Genome Atlas project.
Under the grant, the WCM-NYGC collaborators will look to handle coding mutations in clinical contexts including relevance to immunotherapies. They'll also explore the role of driver non-coding mutations in transcriptional regulation, as well as the driving role of structural variations as one of 11 specialized genomic data centers that will be responsible for analyzing genomic, epigenomic, transcriptomic, and other kinds of data for the next phase of the Cancer Genome Atlas.
Investigators at the institutions submitted an application for the center last year in response to an NCI funding opportunity that called for applications to establish up to 14 specialized genomic data centers. The NCI ultimately approved 11 applications to implement computational tools and pipelines for processing, integrating, and visualizing genomic data. Teams were asked to focus on at least one of the following areas: coding mutations, non-coding mutations, expression/mRNA analysis, copy number analysis, miRNA analysis, long non-coding RNA analysis, batch effects, methylation analysis, pathway analysis, and protein expression analysis.
Olivier Elemento, associate director of WCM's Institute for Computational Biomedicine and one of three co-principal investigators on the project, told GenomeWeb that the two institutions chose to submit a joint application to the FOA because they both saw an opportunity to bring their experiences in clinical variant interpretation and reporting as well as their computational infrastructure to bear on a large number of samples across many different tumor types. Researchers from both institutions have collaborated on several projects in the past and published a number of papers together including one published last year in JAMA Oncology that described an assessment of treatment response biomarkers for a range of metastatic cancers.
The sheer size of the data that will likely come out of this phase of the TCGA also offered a compelling reason for collaboration. Both centers have sizable computational in-house infrastructure but their individual resources may not be sufficient on their own to handle what will likely be petabytes of data generated by this phase of the project. It's not clear yet exactly what the number will be but Elemento expects that it will be substantially more than was generated in the previous iteration of the project. "We will need a tremendous amount of resources to be able to do the analysis that we propose to do in the grant," he said. "Working with the NYGC will make us be able to cope with the data analysis challenges that the TCGA is going to have."
The collaborators will make use of the Precision Medicine Knowledgebase, a database of clinical-grade tumor mutations, annotations, and interpretations gleaned from patient samples, that was developed in Elemento's laboratory. The database supports Weill-Cornell's Exome Cancer Test, (EXaCT-1) which is used to detect point mutations, insertions and deletions, and copy number variations in patient samples. On one hand the investigators plan to use existing information within the PMKB to annotate variants identified in the TCGA samples but they also plan to develop a new module for the resource through which they intend to crowdsource variant curations.
The way this will work, Elemento explained, is that the he and his colleagues will upload variants identified from the TCGA samples to the PMKB and then reach out to experts in the biomedical community and ask them to submit clinical-grade interpretations based on peer-reviewed literature. A pre-selected team of board-certified pathologists will evaluate the submissions and modify or approve them as they see fit.
"[We realize] that interpretation of mutations in the clinical context is very labor intensive and hard for a single site to do," he said. "So we'll do this across at least [our] two sites but we'll also make it possible for other sites to contribute interpretations." They will also pull in additional data from other repositories, such as mutational frequency data, to provide stronger support for clinical interpretations as needed, he added.
The researchers also plan to use at least three computational pipelines that they have developed on the TCGA data. One of these is a pipeline for identifying tumor biomarkers that predict response to cancer immunotherapies. It assesses immunological data such as checkpoint expression, mutation burden, and predicted neoepitopes, and estimates T-cell receptor sequence profiles. According to the grant application, the researchers also plan to develop a so-called immunoscore that uses features from immune landscape analyses to build predictive models of patients' likely response to immunotherapies. The researchers will also use the pipeline that supports the ExAC-1 test, which uses de novo discovery and other detection methods to call clinically relevant variants and annotate them with information from the PMKB. Some details of the pipeline are provided in a paper published over the summer in NPJ Genomic Medicine.
The researchers will also use the Function-based Prioritization of Sequence Variants, or FunSeq, pipeline, which was developed by the functional interpretation team of the 1,000 Genomes project for annotating non-coding variants and prioritizing them in the terms of the strength of their impact on disease. For the TCGA data, the researchers will use the tool to characterize driver somatic and germline non-coding variants from whole-genome and whole-exome sequencing data.
According to their grant proposal, they plan to focus specifically on non-coding variants in overlapping promoters, enhancers, transcription -factor binding sites, and DNAseI hypersensitive sites. They also plan to assess the effects of germline and somatic variants on gene expression, according to the application. In addition, the researchers plan to expand FunSeq to include methods for identifying structural variants that have a role in regulating gene expression, Elemento said. The researchers plan to share their software and pipelines as Docker containers so that they can be run on both local and cloud computing platforms.
Other activities planned for the grant include developing a clinical report of mutations similar to the one used for the EXaCT-1 test, Elemento said. For example, like the EXaCT report, it will categorize mutations into clinically actionable or driver mutations, which may be reclassified as actionable as new information becomes available. They will also report non-driver or passenger mutations, which although not actionable are becoming increasingly important for immunotherapies, Elemento said. "Many of those mutations ... have the potential to be what we call neoepitopes which can give rise to peptides that are mutated and can be recognized by the immune system as non-self-peptides," he explained. "That's why we want to report them."
They will also work on ways of including non-coding mutations in the clinical reports, as well. "Right now there are very few non-coding mutations that are actionable or very important ... the hope is that this grant with the NYGC is going to enable identification of more such clinically relevant non-coding mutations," Elemento said. "That's one of the most exciting aspects of this grant."