NEW YORK (GenomeWeb) – A University of California, Los Angeles-led team has come up with a new statistical method for finding survival time-associated messenger RNA isoform ratio signatures in tumor sample sets.
"Survival analysis of mRNA isoform variation" (SURVIV), described in Nature Communications yesterday, uses RNA sequencing reads to estimate mRNA isoform ratios in relation to survival times by calculating both transcript isoform ratios and uncertainty estimates around these ratios.
When Yi Xing, a microbiology, immunology, and molecular genetics researcher at UCLA, and his colleagues applied SURVIV to RNA sequence data for nearly 700 invasive ductal carcinoma breast cancer samples profiled for the Cancer Genome Atlas project, for example, they found around 200 exon-skipping events potentially tied to survival time. Their results in real and simulated breast cancer data suggested the algorithm produced more accurate survival predictions than exon-skipping analyses done without uncertainty measurements, especially when deep sequence data is not available.
The prognostic potential of the approach appeared to be further enhanced when considered alongside survival clues from clinical and gene expression data. But when just one of these sources of information was available, the team found that the splicing-based survival predictions outperformed gene expression-based survival estimates — results it subsequently recapitulated in five other cancer types.
"One of the limitations in past studies was that quantifying splicing using molecular approaches in hundreds of samples is really not that straightforward," Xing, the study's senior author, told GenomeWeb, calling the isoform ratio uncertainty estimate the "major novel feature of this algorithm."
Taking into account the number of times each splice junction has been seen by RNA-seq reads across the tumor samples provides a sense of isoform ratios with more or less support from the available dataset.
For example, the algorithm gives more weight to an isoform that makes up 1,000 out of 2,000 reads for a given transcript than it would if the isoform represented one out of two transcript reads. Even though both scenarios describe an isoform with the same ratio, Xing said, the" confidence you have in that ratio estimate is very different depending on the [read] counts you have."
Results from past mRNA sequencing studies done across multiple tissue types suggest most human genes coding for more than one exon undergo alternative splicing in some circumstances or tissues, he and his colleagues noted. And there is evidence that alternative splicing events contribute to processes along the cancer development spectrum from tumor formation to immune escape and metastasis beyond the primary tumor site.
"The plasticity of alternative splicing is often exploited by cancer cells to produce isoform switches that promote cancer cell survival, proliferation, and metastasis," the SURVIV developers wrote, adding that the wide availability of cancer transcriptome data generated by RNA sequencing offers a potential window into this process.
Rather than searching for cancer-specific splicing events, as has been done in the past, the team set out to find survival time-associated splicing clues in large sets of tumor samples alone.
"We really wanted to do a larger-scale, unbiased analysis," Xing said, "and we realized that there was a need to develop more rigorous and sensitive statistical methods to handle the variation and noise in these larger-scale RNA-seq datasets."
For their proof-of-principle study, the researchers used SURVIV to look at one form of alternative splicing, exon-skipping, first in real and simulated RNA-seq data for invasive ductal carcinoma breast tumors and then in five other cancer types.
With simulated data for tens of thousands of exons in 600 breast cancer tumors that had splice junctions covered by a range of read depths, the team found that SURVIV was more accurate for finding isoform ratios and relating them to survival than Cox regression analyses that did not take into account uncertainty, especially as read depths diminished.
"By accounting for the uncertainty, we can get more reliable estimates, especially for datasets where the [RNA] sequencing coverage is not very high, which is fairly common for clinical studies," Xing said.
"The deepest coverage we simulated had maybe 200 million RNA-seq reads for every sample," he added. "And very few RNA sequencing datasets have that kind of coverage," he added. "The typical [RNA sequencing] datasets in the Cancer Genome Atlas have only a quarter of that."
SURVIV's accuracy continued to edge out the Cox approach even when actual survival times were censored out of the analysis and overall patient survival rates of 85 percent were assumed, he and his colleagues reported.
In real RNA-seq data from TCGA representing 682 invasive ductal carcinoma cases, the algorithm picked up 229 exon-skipping events that coincided with patient survival times in two or more of the invasive ductal carcinoma subgroups.
The exon-skipping events within that set were enriched for isoforms representing genes with possible roles in cancer, he explained, such as transcription factor genes or genes from DNA damage response, oxidative stress, or apoptosis pathways.
While not all of the genes are expected to contribute to cancer directly, the researchers believe altered splicing events will provide clues to the tumor's biology or its potential vulnerabilities.
Jun Yao, a neuro-oncology researcher at the University of Texas MD Anderson Cancer Center who was not involved in the study, noted that it will be useful to begin teasing apart the biological consequences of the altered genes affected by exon-skipping in the analysis.
Yao, who has studied tumor-specific isoforms and alternative splicing in cancer in the past, called SURVIV "an interesting, novel approach" for using sequencing data to predict cancer patient survival. Still, he noted that additional validation and follow-up functional studies are needed to tease out any exon-skipping associations detected by chance.
Along with biological clues, the team hopes that such isoform ratio analyses may yield smaller sets of candidate splicing markers with potential prognostic value. In the breast cancer data, the group found that it could cluster samples into high and low survival groups with different SURVIV-based exon-skipping profiles or use specific exon-skipping events to classify tumors.
Through a network analysis that included splicing regulators, the researchers also identified three splicing factors that appear to regulate at least 84 of the survival-associated exon events identified by SURVIV in breast cancer, including a splicing factor called TRA2B that was previously detected at higher-than-usual levels in some breast cancers. Levels of each splicing factor tended to rise in tumors from patients with poorer survival outcomes.
Somewhat unexpectedly, Xing said, he and his team found that the SURVIV-based exon-skipping analyses predicted survival times more accurately than gene expression data alone in breast cancer but also in five other cancer types with available TCGA RNA-seq data: lower-grade glioma, glioblastoma multiforme, kidney renal clear cell carcinoma, lung squamous cell carcinoma, and ovarian serous cystadenocarcinoma.
The team suspects samples with poor sequence quality or RNA degradation might skew the gene expression patterns slightly, altering survival prediction accuracy, while the inclusion of at least two alternative transcript isoforms from each gene might act as an internal control in the isoform ratio-based analysis. Still, Xing noted that they have not ruled out possible biological reasons for the survival prediction differences.
He said the team is interested in exploring collaborations with other groups to get a hold of samples in additional cancer types, with an eye to ultimately commercializing predictive tests that center on SURVIV.
"One of the outcomes could be what we're looking at in this work, which is the survival time," he explained. "The other thing that interests us is using these kinds of molecular signatures to predict response to therapies. I think that would have a lot of clinical and also commercial interest."
Though the current analysis centered on exon skipping and inclusion, Xing said the SURVIV approach is designed to pick up all types of alternative splicing events — from alternative 5' or 3' splice sites to aberrant intron retention — across a broad range of cancer types assessed by RNA-seq.
"For this work we were only looking at exon skipping and inclusion, but the underlying statistical model actually applies to any type of splicing event. That's what we're looking at now," Xing said. He noted that the group is also working on a new approach to incorporate clinical variables beyond survival time into the model.