NEW YORK – Researchers have developed a computational tool to identify and analyze noncoding mutations and their impact on gene expression and used it to establish distinct mutational patterns in five major pediatric cancers.
The technology, called PANGEA (predictive analysis of noncoding genomic enhancer/promoter alterations), could eventually help researchers and clinicians more accurately classify cancer subtypes to better guide prognoses and treatments.
The researchers, primarily affiliated with Children's Hospital of Philadelphia, published their findings in Science Advances in late July. They used PANGEA to examine noncoding mutations in more than 500 patients with five types of pediatric cancer: B cell acute lymphoblastic leukemia (B-ALL), acute myeloid leukemia (AML), neuroblastoma (NBL), Wilms tumor (WT), and osteosarcoma (OS).
"Given the prevalence of noncoding mutations and the drastic increase of whole-genome sequencing data, novel computational methods are critically needed to systematically identify putative causal noncoding mutations," the authors wrote.
Previous research on noncoding mutations has primarily focused on single-nucleotide variants and small indels, but systemic analyses of structural variants (SVs) has been lacking, according to the study. The study found that the most frequent class of putative causal noncoding mutations are SVs.
"We're still learning more and more about this noncoding portion of the genome," said corresponding author Kai Tan, a professor at the University of Pennsylvania.
Prior research has also focused on identifying coding sequences in cancers, which is a problem when trying to diagnose and treat certain subtypes of both pediatric and adult cancer, said Tan. It is more difficult to identify noncoding sequences because they make up 98 percent of the human genome, whereas the coding sequences contribute 2 percent to the genome.
"One of the main challenges when you try to interpret the function of a noncoding sequence is there's no natural context you can interpret it with," said Tan. "With a gene mutation, if you somewhat know the gene function, you will know what kind of consequences the coding mutation could be. Our knowledge about the function of noncoding sequences, in general, is very limited, so it's hard to actually identify and interpret noncoding mutations."
There are many anecdotal examples of how noncoding mutations affect cancer development, but there has not been a methodical analysis about it, he adds. If researchers don't look at noncoding information, they may miss a lot of cancer genes, said Tan. To his knowledge, he added, this study is the first systematic analysis of noncoding mutations in pediatric cancer.
Using the joint analysis of patients' mutations and gene expression profiles, PANGEA's algorithm identified all classes of putative causal noncoding mutations in the five pediatric cancers. This pan-cancer analysis of noncoding mutations used 501 pediatric cancer patients of five histotypes with matched WGS and RNA-sequencing data generated by the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) Project, which has sequenced more than 1,000 genomes from five common pediatric cancers. TARGET is sponsored by the National Cancer Institute, with the goal of generating genomic sequences of these cancers.
Among the five cancer types in the recently published study, there were 163 patients with B-ALL, 153 patients with AML, 100 patients with NBL, 53 patients with WT, and 32 patients with OS.
Using PANGEA, the researchers identified different types of mutations that were associated with gene expression changes, including single nucleotide variants, small indels, copy number variations, and structural variants. In total, the PANGEA analysis of these pediatric cancer types identified 1,175 genes recurrently altered in their coding regions and 2,162 genes recurrently altered in their noncoding region.
Based on the results, one main finding was that PANGEA can be used to systematically analyze a full range of noncoding mutations in pediatric cancer. The researchers also revealed that there are multiple ways gene regulation can be disrupted by noncoding mutations. For example, the authors found that metabolic genes and pathways may be preferentially affected by noncoding mutations and that these mutations tend to affect genes located in regions with early replication timing. Further studies need to confirm whether or not these findings correlate with a novel oncogenic mechanism, the authors noted.
Another key finding was a "very small" overlap between the genes affected by noncoding versus coding mutations.
The results highlight the need for comparative analysis of both coding and noncoding mutations because the latter may reveal novel cancer-related genes and pathways, the authors said.
Elaine Mardis, co-executive director of the Rasmussen Institute for Genomic Medicine at Nationwide Children's Hospital, who was not involved with the study, said that it is "a very interesting, new integrated analysis approach that further supports the importance of complex structural rearrangements of the genome in the etiology of human disease, beyond that of single nucleotide variants, and reinforces the importance of studying RNA in the context of genomic variation."
Based on the findings, she adds that the study demonstrates that simple mutations in pediatric cancer may be less important than structural variants and the genomic regions they compare to. Identifying these structural variants and genomic regions, "especially in the context of recurrence, may identify new oncogenic drivers or may indicate likelihood of response either to existing or novel therapies," she said.
The next steps for Tan and colleagues include further studies to confirm their observations of noncoding mutations, while using a bigger cohort ideally made up of hundreds to more than a thousand patients for each pediatric cancer, he said. There are also experimental plans to generate more sequencing data within other pediatric cancer types, such as those concerning brain tumors.
As far as PANGEA being used for clinical applications, Tan sees this as possible in the next five years. Once the technology is more developed and the specific predicted mutations are validated experimentally, he hopes that the team's noncoding mutation data will be added to commercial products that analyze genetic mutations in cancer. Compared to alternative computational tools, PANGEA is the first algorithm to study all types of noncoding mutations in a systematic way, said Tan.
One challenge that PANGEA overcomes compared to other computational tools is linking a particular noncoding sequence with a target gene. Another advantage is that it integrates genome sequence data with AI sequencing data so that researchers are supplied with more evidence about whether or not a mutation can cause a change in target gene expression.
Adding this information would allow researchers and clinicians to better classify cancer patient subtypes for both prognosis and treatment outcomes, he said, adding that the tool could also help identify noncoding mutations in other human diseases, such as heart disease and certain neurological diseases.
"Moving forward, I would imagine other investigators will be interested in either following up with our predictions or using our tool to analyze additional data sets," said Tan.
With the goal of making this computational tool a resource for the cancer research community as well as the human genetics community, Tan said he has already received various requests from other cancer researchers and bioinformatics tool developers since the study was published.