GLASGOW – Preliminary analysis of transcriptome data from a large patient cohort within the UK 100,000 Genomes Project illustrates the potential of RNA sequencing for improving rare disease diagnostics.
In a talk at the European Human Genetics Conference (ESHG) annual meeting here on Saturday, Jenny Lord, a researcher from the University of Southampton, presented results from a study that analyzed RNA-seq data from more than 4,400 patients with unsolved disorders.
Carried out by Genomics England, the 100,000 Genomes Project aims to sequence approximately 85,000 rare disease or cancer patients within the UK's National Health Service (NHS) to gain genomic insights into these diseases.
While the project includes a diagnostic pipeline for participants with unsolved disorders, Lord said, the current diagnostic rate is only about 25 percent, leaving a "big room for improvement." This is particularly the case for diseases associated with variants within the noncoding regions of the genome or those that affect RNA splicing, she added.
To help improve the diagnostic yield, Lord and her team turned to patients' multiomics data, which were collected as part of the 100,000 Genomes Project. In particular, her team analyzed blood samples that underwent globin RNA and ribosomal RNA depletion for RNA sequencing using the Illumina 100 bp paired-end chemistry.
According to Lord, 36 percent of the patients in the analysis had neurodevelopmental disorder-like phenotypes, while the rest presented with phenotypes of cardiovascular, renal, and other diseases. Demographically speaking, more than half of the patients were male, and about 70 percent were white British.
"One of the questions that [we] were asked most often is: 'How well are you going to be able to assess these phenotypes in a dataset of whole-blood RNA sequencing?'" Lord noted. "It's not the obvious tissue of choice for the majority of these disorders."
Her team considered 5 transcripts per million (TPM) — a metric for normalizing and quantifying gene expression — to be sufficient for splicing and expression analysis. With that cutoff, she said, about half the genes could be analyzed.
Furthermore, about 73 percent of the genes that were most likely related to disease — for the majority of participants, a neurodevelopmental disorder — had a TPM of 5 or more. "We are really positive that we should be able to assess most disease genes using this data," Lord added.
In their study, Lord and her team looked into both expression and splicing outliers, and their preliminary analysis suggested diagnostic candidates in 20 percent of probands.
Specifically, of the 1,347 probands analyzed for expression outliers, using a tool named Outlier in RNA-seq Finder (OUTRIDER), 7.6 percent showed outlier events associated with genes relevant to the patients' phenotype.
In one case, for example, expression analysis uncovered an outlier event in RPL5, a gene associated with Diamond-Blackfan anemia, consistent with the phenotype observed for this patient and providing a candidate diagnosis, Lord said.
The team also profiled splicing outliers in 4,438 probands using a tool called LeafCutterMD. The result showed about 8 percent of them had splicing events linked to disease-relevant genes. Meanwhile, in a preliminary analysis of 200 probands using a recently released pipeline called FRASER2, which is designed to detect splicing outliers most likely to be biologically relevant, Lord said, the team identified outlier events in 13 percent of the cohort.
"It is worth noting that the overlap between the events identified by FRASER2 and by LeafCutter is actually really low," Lord said. "These splicing tools all tend to identify quite different things. That's why it's really important for us to use several different tools, so we can pick up the most amount of stuff."
Of the potentially disease-causing variants identified, Lord said, one was a splicing outlier in PTEN — a gene linked to Cowden syndrome — in a patient with a corresponding phenotype. Since the variant is about 20 bases away from the exon, it would have been annotated as intronic and filtered out by most diagnostic pipelines, including Genomics England's, she noted.
Moving forward, Lord said, there is still "lots of analysis to be done," including running all analysis tools on the full cohort and minimizing false-positive events in the splicing analysis, as well as combining RNA-seq data with whole-genome sequencing results to help detect more variants.
While the study is still in its "early days," based on the preliminary results so far, Lord said her team anticipates being able to find one significant outlier event in a disease-relevant gene in at least 25 percent of the total cohort.
"As we add in additional tools and analysis methods, as well, we would expect [that percentage] to increase further," she added.