Skip to main content
Premium Trial:

Request an Annual Quote

Pan-Cancer Patterns Uncovered With Expression-Centered Deep Learning Method

NEW YORK – An international team led by investigators in the US has started untangling gene expression ties to other tumor features using an unsupervised deep learning strategy.

The approach, dubbed DeepProfile, looks at the relationships between biologically informative cancer features and gene expression by considering relatively low-dimensional latent spaces and latent variables with deep neural network models.

"Unsupervised learning projects high-dimensional input variables into a latent space consisting of a smaller set of latent variables, or factors, capable of explaining the variation in the original input space," co-senior and co-corresponding authors Su-In Lee, a researcher at the University of Washington, and Kamila Naxerova, an investigator affiliated with Harvard Medical School and Massachusetts General Hospital, and their colleagues wrote in a paper published in Nature Biomedical Engineering on Tuesday.

"Our methodologies to make deep neural network models biologically interpretable allow for complex, nonlinear relationships to be learned while retaining stable models," they explained, adding that "DeepProfile's robustness and interpretability enables the discovery of unique biological patterns in large gene expression datasets."

After training the DeepProfile tool with array-based gene expression data for 50,211 samples from nearly 1,100 datasets from the Gene Expression Omnibus database, the researchers applied the tool to 9,079 samples from 18 cancer types that had been profiled for the Cancer Genome Atlas project, bringing together tumor gene expression profiles, normal tissue expression patterns, phenotypic insights from patients, and data from biological databases.

"The application of DeepProfile to a pan-cancer gene expression compendium exposed several intriguing biological patterns," the authors explained, noting that their analyses "were enabled by DeepProfile's integration of the learned model with independent biological databases, including normal tissue expression data, patient-level phenotype data, and protein-protein interaction databases."

Using this approach, the team explored shared features across cancer types and patterns within specific cancer types such as breast cancer, acute myeloid leukemia, and colorectal cancer, while untangling tumor pathway and mutational profiles offering patient survival clues.

In particular, the team pointed to immune cell activation involving genes with altered expression across cancer types, as well as cancer subtype insights informed by genes and pathways with distinct activity.

When they focused on latent variables linked to secondary tumor characteristics, meanwhile, the investigators identified ties between cell cycle gene expression and tumor mutational burden. They also highlighted cancer patient survival associations involving DNA mismatch repair gene activity and macrophage-related class II major histocompatibility complex antigen presentation.

"Beyond the computational advance represented by this approach," the authors suggested, "DeepProfile provides hundreds of biological insights gleaned from existing compendia that can be mined by researchers to advance our understanding of different human malignancies."