Chimeric RNAs should not be studied in isolation, but together, as an RNA network, according to authors of a recently published paper.
In the paper, the international team of researchers described how they used both tiling microarrays and RNA sequencing to study this class of RNAs that possess sequences from different genes and about which little is known.
Based on their findings, the authors argued that these chimeric transcripts are "important," noting phenomena such as the non-random interconnections of genes involved; the greater phylogenetic depth of the genes involved in many chimeric interactions; the coordination of the expression of connected genes; and the close in vivo and three-dimensional proximity of the genomic regions being transcribed and contributing to parts of the chimeric RNAs.
Thomas Gingeras, a coauthor and head of functional genomics at Cold Spring Harbor Laboratory, told BioArray News that the group's findings are "controversial" not only because they highlight the potential biological significance of chimeric RNAs, but also because they provide evidence that they occur in normal cells.
Chimeric RNAs are believed to "have their origins in genetic structural variation, translations causing fusions of different chromosomes to occur, and then giving rise to chimeric RNAs in that fashion," said Gingeras.
"But the idea that normal cells have … these chimeric RNAs is controversial," he said, as their occurrence is interpreted by some in the research community as being solely an artifact of RT-PCR. "We feel very confident and comfortable that these exist and they are not technical artifacts," said Gingeras. "It remains to be seen what the biological importance is."
The new paper, which appeared in PLoS One earlier this month, is the culmination of half a decade of work using multiple technologies with contributions from CSHL, Affymetrix, the Center for Genomic Regulation at the Universitat Pompeu Fabra in Barcelona, the University of Lausanne in Switzerland, the University of Geneva Medical School, the Wellcome Trust Sanger Institute, the Dana-Farber Cancer Institute, Harvard Medical School, the Spanish National Cancer Research Centre in Madrid, the Universitat de Barcelona, University of Massachusetts Medical School, the University of Washington in Seattle, and New York University.
The data and analyses presented in the paper evolved from an Encyclopedia of Functional DNA Elements, or ENCODE, sub-project entitled, "Identify all protein-coding genes: combine computational prediction with experimental RT-PCR confirmation of gene models," according to lead author Sarah Djebali, a bioinformatician at CRG in Barcelona.
Organized by the National Human Genome Research Institute, the ENCODE project aims to identify and characterize all the functional elements in the human genome.
The aim of the recent study was initially "to comprehensively characterize all transcripts of known genes, even less-abundant ones, as well as scale up the efforts already done on the pilot ENCODE regions to human chromosomes 21 and 22," Djebali told BioArray News.
"It is important to note that at the beginning the aim was more general than discovering chimeric transcripts: it was to investigate the extent of transcription of all protein-coding genes on those two chromosomes," she said.
Why chromosomes 21 and 22? "This study goes back to almost 2005," Gingeras noted. "At the time we were using tiling arrays to do the analysis, so doing at that time five-base-resolution analysis of [the] genome … at a reasonable cost required that only certain chromosomes were done at a time," he said. "The chromosomes [that were] best annotated with most sequence at the time were chromosomes 21 and 22."
Gingeras said that the project began while he was still vice president of biological science at Affymetrix. "We had a pilot project where we used a method that copied RNA to the end where the sequence should stop," he said. "When we did that experiment, we saw that for a large percentage of genes on chromosomes 21 and 22 the sequence did not terminate at the annotated 5-prime termini; they jumped into other genes."
In response to these preliminary findings, the authors set out to determine if they could detect these chimeric transcripts among the genes analyzed on chromosome 21 and 22, or if their detection was due to technical artifacts. They also sought to determine whether they could collect evidence supporting the biological importance of any detected chimeric transcripts.
To accomplish this, they interrogated protein-coding genes from human chromosomes 21 and 22 using a combination of methods including rapid amplification of cDNA ends, or RACE, tiling arrays, and RNA-seq.
The authors first selected 1,193 exons from 492 annotated gene loci present on chromosomes 21 and 22 for which they could select highly specific 5′ and 3′ RACE primers. They designed 844 5′-RACE and 824 3′-RACE primers, and carried out the corresponding RACE reactions using polyadenylated selected RNA isolated from 11 normal human tissues and five transformed cell lines. In total, they performed 26,688 RACE reactions.
Products of the RACE reactions were interrogated using chromosome 21 and 22 tiling arrays to look at non-repeat portions of these chromosomes at 17-nucleotide resolution. [fragment] Altogether, the researchers conducted 1,020 array hybridizations, according to the paper. They then used software to determine continuous sites of transcription, referred to as RACEfrags.
The group first characterized and validated array-detected chimeric transcripts by molecular cloning and sequencing, then selected 200 RACEfrags for RT-PCR amplification, full-length cloning, and sequencing; and used sequencing to confirm a total of 112 chimeric connections.
Among these chimeric transcripts were 27 that included sequences from chromosomes other than the chromosome containing the index gene, leading the authors to hypothesize that a genome-wide analysis for chimeric RNAs "may reveal substantial extensive gene-to-gene connections occurring among all chromosomes and indicating similar functionally related genes involved in this fashion."
Gingeras said that the researchers used tiling arrays because they were the "most practical and highest resolution way to investigate large portions of genome" at the time they initiated the study. As the project developed, the authors "went to look for the same kind of RNAs using alternative technologies, using RNA-seq to establish that independent of the [array-based] technique you could see the same thing."
As the authors noted in the paper, chimeric RNAs have been described previously. Some argue that they are artifacts of biological analysis, originating in the template-switching capabilities of RT, mis-mapping of sequence or tiling array results, and cryptic genomic rearrangements present in the samples analyzed.
According to the authors, these issues are "real" and "serious." They said that in their study that they took "considerable effort" to evaluate and estimate the level of the false-positive occurrences present in the data. Still, they found that the presence of chimeric RNAs as molecular events present in normal tissues and cell lines is "strongly supported" and while their biological importance is uncertain, a "number of characteristics of the observed RNAs argue for them to be functional."
The authors are now conducting follow-up studies to provide an answer to this question, which CRG's Djebali characterized as "difficult."
"I think we need to differentiate chimeric transcripts found in cancer and in normal cells," Djebali said. "Of course we know that cancer cells are heavily rearranged and that fusion transcripts in those cases could be used as biomarkers for cancer," she said. "However, they also do exist in normal cells, and we found very low evidence of coding potential in them," she added. "So I would say that most of them are non-coding and may regulate other genes, but of course we are still at the beginning of investigating this matter."
Going forward, the CRG will try to understand the mechanisms that could have led to the formation of such transcripts. "Given a chimeric transcript we would like to be able to predict by which biological mechanism it was generated, the most commonly known until now being genome rearrangement and trans-splicing, but we do not want to restrict [our study] to those," said Djebali. She added that the CRG is developing a chimeric junction detection tool from RNA-seq data that it will assess and test. "That would give us cases to start this mechanism investigation," she said.
Gingeras cautioned that the number of chimeric RNAs in normal cells is "probably pretty low."
"It is not a dominant signal," said Gingeras. If researchers are interested in the occurrence of these RNAs, they should pay attention to the "artifactual nature of what reverse transcripts can do," he said.
A Future for Tiling Arrays?
Though the researchers used tiling microarrays in the initial phase of the study, neither Djebali nor Gingeras is still using that platform. Instead, both of their labs have since switched to RNA-seq, meaning that they will most likely not use arrays in future studies on chimeric RNAs.
"Our lab is … using RNA-seq rather than tiling arrays now, since, in principle, it enables [us] to differentiate between the different alternative transcripts of a gene, and could in theory provide better quantification of individual transcripts," said Djebali.
Still, she said that both arrays and RNA-seq have their drawbacks, including a "problematic RT-PCR step" that is a "potential source of RT template switching and thus of technically artifactual chimeric transcripts." This ensures that confirmation by other RT-independent techniques such as RNAse protection assays "will always be necessary to really be sure of the existence of a given new chimeric transcript," Djebali said.
The CRG is also looking to compare the performance of arrays and RNA-seq. Djebali said that the CRG, the University of Lausanne, CSHL, and the Sanger Institute are developing a technique called RACEseq, where RNA-seq will be performed on the products of the RACE reactions.
"A fair comparison would not be between RACEarray and RNA-seq but between RACEarray and RACEseq," said Djebali. "This has been tried here in collaboration with the same people, however the RACEseq technique is still too young to make us able to compare the sensitivities of the two techniques," she said.
Though RNA-seq will be the technique of choice for the researchers going forward, this does not mean that tiling arrays are an obsolete technology, Gingeras said. "If you have lots of samples and you are just interested in collecting protein transcripts, using tiling arrays is an inexpensive way to do that," he said.
Jasmine Gruia-Gray, vice president of global marketing at Affymetrix, told BioArray News that the firm's chromosomes 21 and 22 tiling array, a design used at Affymetrix Laboratories, and its ENCODE 2.0 tiling array, a design made as part of the firm's participation in the ENCODE consortium, are still commercially available.
She said that the company's customers have used its tiling arrays for a variety of applications, including gene expression profiling, chromatin immunoprecipitation-on-chip, comparative genomic hybridization, methylation, SNP screening, and origin of replication studies.
Have topics you'd like to see covered in BioArray News? Contact the editor at jpetrone [at] genomeweb [.] com