NEW YORK (GenomeWeb) – Researchers from the University of North Carolina at Chapel Hill have come up with a sequence motif-based strategy for sorting out functional similarities between long non-coding RNAs (lncRNAs).
The authors described their "sequence evaluation from k-mer representation" (SEEKR) method, which involves quantifying sequence k-mers to compare lncRNA and classify them functionally, in a paper published in Nature Genetics this week. They explained that SEEKR tallies and compares k-mer profiles, tallying k-mers of a given length in each lncRNA and normalizing these counts with insights about the lncRNA's overall length.
"We found that lncRNAs of related function often had similar k-mer profiles despite lacking linear homology, and that k-mer profiles correlated with protein binding to lncRNAs and with their sub-cellular localization," senior author J. Mauro Calabrese, a pharmacology researcher at UNC Chapel Hill's Lineberger Comprehensive Cancer Center, and his colleagues wrote.
Using this strategy, the team profiled and compared k-mer patterns in 161 lncRNAs known for their conservation in mice and humans, uncovering new and known functional similarities between lncRNAs. It also began clustering human and mouse lncRNAs into k-mer-based "communities" in an effort to understand their functions in relation to their cellular localization and other features.
The investigators also used SEEKR — in combination with a "transposable element to test RNA's effect on transcription in cis," or TETRIS assay — to search for lncRNAs with regulatory functions similar to that attributed to Xist, a lncRNA with a documented role in cis-repression of gene expression. In doing so, they picked up a cis-repressive function for another lncRNA called Kcnq1ot1, despite pronounced differences between Xist and Kcnq1ot1 at the linear sequence level.
"SEEKR detected significant similarity between the cis-repressive Kcnq1ot1 and Xist lncRNAs where none was found by conventional alignment algorithms," the authors wrote. "We conclude that lncRNAs of related function can have related k-mer profiles even if they lack linear sequence similarity."
Likewise, the researchers saw signs that still other lncRNAs — including NEAT1 and MALAT1 — may share previously unappreciated similarities with human and mouse versions of Xist. The repressive activity was more pronounced in lncRNAs with k-mers that were closer to Xist, they reported.
The team noted that most lncRNAs are not fully characterized functionally or mechanistically, although thousands have been identified in the human genome, encouraging the group to pursue a method for systematically assessing lncRNA functions.
"A major roadblock to progress remains the inability to detect recurrent relationships between lncRNA sequence and function," the authors wrote, explaining that "[a]n understanding of analogous relationships in proteins has enabled the classification of protein families, functional domains, and mechanisms that, in turn, have led to discoveries that have improve the diagnosis and treatment of disease."
Based on their results so far, the researchers concluded that SEEKR's k-mer-based classification "is a powerful approach to detect recurrent relationships between sequence and function in lncRNAs."