By Julia Karow
This article, originally published July 18, has been updated with additional information about another RNA-seq standards initiative.
Researchers from the Encyclopedia of DNA Elements, or ENCODE, project have put together a first set of standards and guidelines for RNA-seq experiments to provide some orientation in a rapidly changing field of technology.
Version one of "Standards, Guidelines and Best Practices for RNA-seq," finalized by the ENCODE Consortium in June, is available from the ENCODE Data Coordination Center at the University of California, Santa Cruz. Similar guidelines for ChIP-seq and RIP-seq are currently in development by other members of the project.
According to Tom Gingeras, who wrote the first draft of the guidelines, the document goes hand in hand with the data output of ENCODE, which aims to characterize all functional elements in the human genome, and modENCODE, a similar effort focused on model organisms. "Since the ENCODE and the modENCODE project are collecting large volumes of data, we thought we would try to provide as comprehensive a set of explanations [as possible] of how the data was generated and the standards the data was compared against," he said.
Gingeras, a professor of functional genomics at Cold Spring Harbor Laboratory, together with his colleague Barbara Wold at the California Institute of Technology, spearheaded the guidelines, which had input from both ENCODE and modENCODE consortium members. No instrument vendors were consulted for the document, and the goal is to update it at least annually.
The guidelines are not comprehensive, Gingeras cautioned, and are likely to change in the future as sequencing technologies evolve, which is why he and his colleagues are not calling for journals or grant agencies to enforce them. "This is the first draft of a very evolving document, one that is expected to change dramatically in these early days from version to version," he said. "It's an effort to put a stake in the ground and say, 'OK, this is where we are starting.'"
The seven-page document covers various steps of an RNA-seq experiment, including information to be supplied about each sample; experimental design regarding the number of replicates and sequencing depth; information to be supplied about sample preparation, read mapping, read statistics, and quality scores; and how to report novel transcribed elements.
Some of the recommendations are widely applicable, such as how to describe the sample and processing, while others don't apply to every type of experiment. "This idea of a single protocol for RNA-seq is not correct," Gingeras said. "There are actually different protocols, depending on the kind of RNAs that you want to have captured in your libraries."
Also, the goals of an RNA-seq experiment can differ. A study may aim to define and quantify all RNA species in a sample, for example, or just detect changes in abundant RNA classes across many samples. The current guidelines "do not exhaustively cover the entire matrix of this experimental space, but instead emphasize the best practices designed to support "reference quality" transcriptome measurements for major RNA sample types," according to the document. As users become more familiar with the guidelines, Gingeras said, he and his colleagues may add additional experimental protocols.
The same goes for sequencing platforms. The current version of the guidelines focuses on Illumina's platform because it is the dominant technology at the moment, but "there are clearly other technologies which are rapidly being utilized with specific applications — PacBio is emerging for long reads, the SOLiD system is still being prominently used by a variety of labs, and then there are … the Ion Torrent and the MiSeq … and each of those will add different aspects and requirements," he said.
Gingeras said the next set of guidelines to come out of ENCODE will be for ChIP-seq, followed by those for RIP-seq.
He said a parallel effort to establish standards for RNA-seq is currently ongoing at the Functional Genomics Data Society, FGED, formerly known as MGED. Three years ago, MGED published a draft proposal for sequencing standards called Minimum Information about a high-throughput SeQuencing Experiment, or MINSEQE (IS 4/15/2008).
Have topics you'd like to see covered in In Sequence? Contact the editor at jkarow [at] genomeweb [.] com.