NEW YORK (GenomeWeb) – The FANTOM consortium has generated an atlas of human long non-coding RNAs that indicates that more than 19,000 lncRNAs could be functional.
The international team has previously mapped transcription start sites, examined transcription factor interactions, and identified enhancers, and more. In its latest paper, appearing today in Nature, the Riken-led consortium reported that it built an atlas of nearly 30,000 human lncRNAs with accurate 5' ends, and determined that most lncRNAs were derived from enhancers. In addition, the researchers found that more lncRNAs may have a functional role than widely thought.
"There is strong debate in the scientific community on whether the thousands of long non-coding RNAs generated from our genomes are functional or simply byproducts of a noisy transcriptional machinery," senior author Alistair Forrest from the Riken Center for Life Science Technologies and the University of Western Australia said in a statement. "By integrating the improved gene models with data from gene expression, evolutionary conservation, and genetic studies, we find compelling evidence that the majority of these long non-coding RNAs appear to be functional."
To build this atlas, Forrest and his colleagues combined transcript models from GENCODE, ENCODE, miTranscriptome, and Human BodyMap 2.0 datasets with 70 FANTOM5 samples that had been profiled using their cap analysis of gene expression (CAGE) method. With this approach, the researchers were able to gauge transcriptional start sites (TSSs) based on CAGE clusters and TIEScores, which estimate the likelihood a CAGEcluster and transcript model truly house a TSS.
The researchers noted that many other efforts to map transcription don't always capture accurate 5' ends, but here, they reported 27,919 lncRNA genes with high-confidence 5' ends.
Based on the overlap of their TSSs and DNase I hypersensitivity sites (DHS) thought to be promoter, enhancer, or dydadic regulatory regions, the researchers found that a large number of lncRNAs originated from enhancer DHSs, rather than from promoters. They further reported that lncRNAs are more conserved than previously appreciated.
A number of lncRNAs also appear to be involved in disease. By combining genetic and gene expression data, the researchers found that lncRNAs that overlapped with SNPs associated with disease through genome-wide association studies were expressed in disease-relevant cell types. This suggested to them that the lncRNAs could be involved in multiple diseases.
In addition, they noted that lncRNAs that overlap with eQTL-associated SNPs are co-expressed with the corresponding mRNAs, indicating that these lncRNAs may help regulate transcription.
All together, the researchers argued that their analyses suggest that some 69 percent of the FANTOM CAT lncRNAs — or 19,175 — could be functional. That, they noted, rivals the number of known protein-coding genes.
"The improved gene models and the broad functional hints of human long non-coding RNAs derived from this atlas could serve as a Rosetta Stone for us to experimentally investigate their functional relevance as part of our ongoing work," Riken's Piero Carninci added.