Researchers at the Georgia Institute of Technology have developed an RNA-seq analysis pipeline that they claim is faster and more user friendly than existing software and should be particularly applicable to cancer research.
Among other applications, the pipeline can help biologists and clinicians compare the RNA expression profiles of normal and tumor cells, which could ultimately lead to more personalized therapies based on a better understanding of the variations in an individual's tumor, the researchers said.
The tool, dubbed the RNA-seq Analysis Pipeline, or R-SAP, includes parallel processing capabilities in order to speed up the computation — a feature that many other RNA-seq tools lack.
Furthermore, according to the developers, R-SAP is easy for non-experts to use since it does not require complicated installation procedures or parameter changes and, unlike some similar packages, it runs on multiple operating systems, including Linux, Windows, and Mac.
Georgia Tech's Vinay Mittal and John McDonald, R-SAP's developers, explained the pipeline in a paper published in a recent issue of Nucleic Acids Research.
In the paper, the researchers compared R-SAP’s runtime with Trans-ABySS, using RNA-seq data from the MicroArray Quality Control Consortium, and found that the former processed the information twice as fast — about 319 minutes for R-SAP compared to 729 minutes for TransABySS.
A separate comparison with Cufflinks using data from the Encyclopedia of Data Elements found that R-SAP failed to perform as quickly — attributed to the fact that R-SAP is implemented in Perl and Cufflinks is in C, which tends to run quicker than Perl. R-SAP's performance improved, however, as the number of processors increased, the paper states.
The authors noted that R-SAP demonstrated a "significant improvement" over both Trans-ABySS and Cufflinks in terms of categorizing transcripts.
The developers expect the tool to help biologists and clinicians process large quantities of RNA sequencing data and pull out information useful for clinical and research purposes. While they believe it should be particularly useful for cancer, it could be used to find variants in other disease types.
The paper explains that R-SAP uses a hierarchical decision-making approach to group transcripts into classes and then generates files that contain gene-expression levels as well as information about splice variants, biomarkers, and chimeric RNAs, all of which can be viewed in online genome browsers.
It works by aligning input RNA sequences to reference genomes and looking for points of mismatch that could indicate new isoforms as well as fragmented alignments that indicate chimeric transcripts such as fusion genes. It also quantifies gene expression, the paper states.
As R-SAP finds differences, such as splice variants, it lumps them into one “bin” and does the same for chimeric or translocation events where reads map to multiple locations in the reference, McDonald, associate dean of biology at Georgia Tech, explained to BioInform.
Additionally, R-SAP captures information about the reads that map to expected locations in the genome, which are then used for gene-expression quantification, he said.
Biologists and clinicians can use R-SAP to compare the transcriptomes of normal cells against those of individual cancers, the developers explained.
For example, a clinician can use the pipeline to compare levels of expression in tumor versus normal cells as well as locate different splice variants that show up in cancer cells and not in normal ones, McDonald said.
Currently McDonald, a cancer biologist, is using the software to study the significance of splice variants in projects at Georgia Tech.
Meanwhile, co-author Mittal, a doctoral bioinformatics candidate, is using the pipeline to characterize RNA-seq data from the Cancer Genome Atlas.
Mittal told BioInform that the researchers are now developing a plugin that will allow users to explore the effects of splice variants on RNA or other regulatory elements in a cell as well as their impact on alternative splicing.
The developers plan to release the plugin sometime in the next two months, he said.
Have topics you'd like to see covered in BioInform? Contact the editor at uthomas [at] genomeweb [.] com.