NEW YORK (GenomeWeb) — Researchers from Children's Hospital of Philadelphia (CHOP) have developed a computational tool that uses deep-learning techniques to detect the different ways RNA is spliced when it is copied from DNA.
The researchers have described the method in a paper published last week in Nature Methods. According to its developers, the Deep-learning Augmented RNA-seq analysis of Transcript Splicing (DARTS) uses deep-learning methods to harness the wealth of information available in public RNA sequencing datasets to provide insights into alternative splicing between biological samples. According to the developers, DARTS can help researchers discover new disease biomarkers and therapeutic targets.
According to Yi Xing, study lead and director of the Center for Computational and Genomic Medicine at CHOP, DARTS bridges the gap between large-scale public domain datasets and smaller datasets from focused studies in individual labs. It provides the kind of functionality needed to "transform massive amounts of public RNA-seq data into a knowledge base, represented as a deep neural network, of how splicing is regulated," he said in a statement.
Furthermore, the solution works even if the RNA-seq dataset in question only offers modest coverage. At least one study calls for coverage of about 100 million reads per sample for deep coverage. With DARTS, roughly 20 to 30 million RNA-seq reads are sufficient for researchers to "make educated guesses and inferences on things you were never able to see in the past," Xing noted.
Xing's lab studies how RNA-level processes diversify the transcriptome and proteome as well as how to use transcriptome information to guide disease diagnosis and treatment. "Over the years, we have developed a variety of computational methods and software tools to detect RNA splicing using high-throughput sequencing datasets and also to understand the regulation and functional consequences of RNA splicing," he said in an interview.
Variations in RNA splicing may cause disease, modify disease risk, or cause more severe forms of disease. Massively parallel RNA sequencing has emerged as the standard technology for investigating alternative splicing.
In 2017, researchers from Brown University and the University of Utah published details of an assay for identifying and authenticating splicing mutations using high-throughput in vitro and in vivo experiments called MaPSy. However, costs associated with RNA-seq experiments can put deep sequencing out of reach for individual research labs. Furthermore, "you have a lot of medically important genes that are expressed at moderate or low levels in the cell," Xing said. "Even if you have very deep coverage on your sample, you may not have deep coverage for your genes of interest."
DARTS addresses this need by leveraging public RNA-seq data to build a knowledgebase of splicing regulation information. As explained in Nature Methods, DARTS is comprised of two core components. The first is a deep neural network model that uses exon-specific sequence features and sample-specific regulator features to predict differential alternative splicing between two conditions. Large-scale RNA-seq datasets from public resources such as the Encyclopedia of DNA Elements (ENCODE) are used to train the network to make predictions about splicing activity. The second DARTS component is a statistical model that infers changes in alternative splicing by integrating evidence from individual RNA-seq datasets with information on the prior probabilities of differential alternative splicing from the deep neural network.
For the paper, Xing and his colleagues trained three models using datasets from the ENCODE consortium and the Roadmap Epigenomics Project. The researchers explain in the paper that these datasets are used to generate training labels of high-confidence differential or unchanged splicing events between conditions that are then used to train DARTS's deep neural network. Once trained, the network infers differential splicing from new datasets by incorporating predictions from the training datasets with observed RNA-seq read counts from the new datasets. These new predictions of alternative splicing then become fresh fodder for the model to improve its future predictions of RNA splicing.
"Essentially, there are two key innovations at work, " Xing explained. "We demonstrated that we can construct a deep neural network that uses the information of the genome as well as the concentrations of splicing regulatory proteins … to predict between any two conditions, what kind of exons might change." Another key aspect is "that we used a Bayesian statistical framework to couple the deep-learning prediction with what you observe on your specific dataset."
Given an input RNA-seq dataset, "we could look at the sequencing information in [the] data to infer what kind of exons or splicing events might have shifted between different conditions," he said. Additionally, "we basically use Bayesian statistics to build a bridge between the small data from focused biological studies with the big data that are generated and deposited into the public domain coming from very large genome consortium projects."
In the paper, the researchers applied DARTS to lung and prostate cancer cell lines to test the tool's ability to predict splicing patterns in these cells. Among other findings, they found that DARTS could identify changes in alternative splicing patterns in genes with significantly lower expression levels and lower sequencing coverage. Specifically, DARTS predicted 53 additional differential splicing events beyond the 77 events identified by analyzing with RNA-seq data alone.
"[It] offers an exciting conceptual framework that we could adapt to other uses," Xing said. "For example, we might create a version that predicts alternative splicing in specific patient tissues," potentially improving the diagnosis of rare diseases from tissue biopsies.
"The use of deep learning for looking at RNA splicing has been a very active research field," Xing said. Earlier applications of deep learning to alternative splicing include work done in the laboratory of Brendan Frey, a professor of engineering and medicine at the University of Toronto and the founder and CEO of Toronto-based bioinformatics company Deep Genomics.
In 2014, Frey's lab published a paper describing a deep-learning computational model for predicting the impact of genetic variants on alternative splicing and showcasing its use to identify variant-driven splice alterations involved in neurological disorders and cancer. His company opened its doors in 2015 to commercialize products based on the technology. "It's still an actively developing field, so there's a lot of possibilities," Xing said. For example, "we know from recent literature that a lot of those undiagnosed diseases could be resolved by looking at RNA splicing."