Within the next week, Partek plans to add new capability to its Genomics Suite to analyze RNA sequence data generated from second-generation sequencers.
The firm will join a growing number of bioinformatics software companies, including CLC Bio, Geospiza, SoftGenetics, and DNAStar, who have already added RNA-seq analysis tools to their offerings to help address growing demand for this capability.
Indeed, for labs looking to complement — and, in some cases, replace — gene expression microarrays with RNA-seq, data analysis remains a major hurdle.
Oleg Evgrafov, an assistant professor in the Department of Psychiatry at the University of California's Keck School of Medicine, is bullish on the promise of RNA-seq on the Illumina Genome Analyzer as compared to microarrays, but said that the software has not been able to keep up with advances in the instrumentation — a particular problem for small research groups that can't spare resources for software development or informatics support.
There are a number of open source analysis tools available for RNA-seq, such as CalTech's Enhanced Read Analysis of Gene Expression, or ERANGE, but using these tools is tricky for researchers who are not comfortable with a Unix machine, Evgrafov told BioInform, especially if they want to move their output into an analysis workflow. "It's relatively easy if you can write Python scripts … but you need capacity," he said. "You need at least one or two people doing just that, and I'm a DNA person. I just want to press the button and get it done."
The problem, he said, is that there is still no "reasonably good tool" for basic RNA-seq analysis. "If I want to just get the numbers — just how is this gene expressed — I press three buttons [for different software packages] and I get three different answers," he said.
Evgrafov is trying out Partek's software in combination with ERANGE and Illumina's Genome Studio, but noted that each tool still has its limits. "We use all three, and I'm still not happy," he said. "All three together can't give me what I want."
He acknowledged, however, that this problem is an inherent characteristic of a field that is evolving rapidly. "The technology is developing so fast that the [software developers] can't keep up," he said. As an example, he said that his lab is currently looking to use 72-basepair Illumina RNA-seq reads to identify novel splicing sites in the genome — a capability for which there is currently no adequate analysis algorithm. However, he noted, "people didn't even think about that [application] three months ago because we didn't have long enough sequences."
Despite the "big holes" in currently available tools, Evgrafov said he is pleased with Partek's capabilities for RNA-seq analysis. A key advantage of the software, he said, is its downstream statistical analysis applications, which "are very good."
In addition, he said that Partek's RNA-seq workflow improves upon Genome Studio's analysis capabilities in certain areas, such as novel gene identification. Currently, Genome Studio does not offer a way to analyze reads that map to the genome, but not to known genes. "You can see them on the browser, but there is no tool to extract them as possible genes," he said. "Partek has implemented something that is a first step – they look for clusters of reads that overlap each other and kind of cluster, and they show them to you in a different file, so you can find them and decide what you want to do with them, and I think that's very important."
Haiqing Li, a bioinformatics specialist at City of Hope's Informatics Core Lab who is also using Partek's RNA-seq analysis function, also cited the Genomics Suite's downstream analysis capabilities as a plus. "Once you process all the RNA-seq data, you can do downstream analysis very easily. You basically just click a button and generate a report without having to go to another part of the software," he said. "The end user just wants one application that can do everything, and Partek so far can provide this function."
Li also cited Partek's standardized analysis workflow as an advantage for end users. "You may not get the specific answer you need, but you get all the information you need. Partek gives you a big spreadsheet, which the end users are already familiar with. They can retrieve information from the spreadsheet very easily."
[ pagebreak ]
Li said that Partek's ability to analyze microarray data alongside RNA-seq data is another benefit, especially since "many users" at City of Hope are already using the software for microarray analysis and biostatistics.
Nevertheless, like Evgrafov, Li said that it's unlikely that any one software package will ever meet the needs of all end users, particularly for analyzing next-gen sequencing data. "Each [investigator] has [his/her] own analysis requirements," he said, noting that many experiments are some combination of RNA-seq, ChIP-seq, methylation analysis, and sequence analysis. "It will be hard to find one tool to meet all those requirements."
Transcripts, Not Genes
Mike Lelivelt, vice president of genomics at Partek, told BioInform that the company's approach to RNA-seq is that "it's not about genes, it's about transcripts."
The software summarizes data at the transcript level, instead of the gene level or exon level as most other tools do, he said, which provides researchers with much more detailed information about their expression studies. In particular, the firm has taken advantage of recent "enhancements in modeling transcripts from genes," to help researchers make predictions about alternate splice forms based on transcript data.
Partek's RNA-seq workflow includes a collection of publicly available tools, such as the Bowtie short-read aligner and TopHat slice site discovery tool developed by Steve Salzberg's group at the University of Maryland; a modified version of an expectation/maximization algorithm developed at Christopher Lee's lab at the University of California for reconstructing full-length transcript isoforms from sequence fragments; and the AceView alignment programs that the National Center for Biotechnology Information uses to construct the AceView database of genes and alternative splice variants.
"We see innovation coming from the academic community," Lelivelt said, "and we integrate these tools and deliver them to the biologist in an easy-to-use fashion with support and full integration with their microarray data."
Indeed, Partek views integration with microarray expression data — as well as ChIP-chip/ChIP-seq, copy number, and microRNA data — as a key selling point for its software.
The company has built a strong business around microarray analysis and intends to expand upon that as it moves into the next-gen sequencing market. Lelivelt said that Partek has no intention to address short read de novo assembly or resequencing analysis, as some of its competitors in the bioinformatics market do. Those types of applications, he said, are primarily of interest to large sequencing centers, which isn't a very large addressable market.
Bench scientists, on the other hand, represent a large market, "have years and years of array work" behind them, and are most interested in functional genomics applications like transcription and regulation, "which is where we're sitting," Lelivelt said.
The company does not foresee RNA-seq displacing arrays completely, however. A key reason, Lelivelt said, is that the price point for arrays is still considerably lower than sequencing, and the "ease of informatics" is much higher.
While that is certain to change in the years ahead, "we do not believe that arrays will be dead," he said, noting that people said the same thing about TaqMan when arrays arrived on the scene, "and TaqMan hasn't died."
If anything, he said, he envisions researchers using multiple platforms, such as PCR, arrays, and sequencing, and integrating their results into a single analysis — not surprisingly, with a software package such as Partek Genomics Suite.