An effort spearheaded by the US Food and Drug Administration to assess the technical reproducibility of next-generation sequencing platforms should begin generating data before the end of the year, with publication of results targeted for next spring, a project organizer told In Sequence.
In addition, the FDA's Sequencing Quality Control, or SEQC, project has expanded over the last year to encompass five additional "satellite" studies exploring a range of applications of next-gen sequencing technology, said Christopher Mason, assistant professor in the department of physiology and biophysics at Weill Cornell Medical College, an Illumina test site for the project.
SEQC is a continuation of the Microarray Quality Control project — an initiative that the FDA began in 2005 to evaluate the reproducibility of microarray platforms. In late 2008, the agency decided to shift the focus of the project to sequencing technologies and issued a solicitation for participation (IS 1/6/2009). Earlier this year, the Association of Biomolecular Resource Facilities began collaborating with the FDA on the project and said it hoped to publish data by the end of 2011 (IS 2/22/2011).
Mason said this week that the eleven sites participating in SEQC's primary project — an evaluation of the technical reproducibility of RNA-seq on the Roche/454, Illumina, and Life Technologies SOLiD platforms — are on track to begin generating data within the next few weeks. The project is assessing the same RNA reference samples used in the initial MAQC project. (See sidebar for a complete listing of all project participants).
While he acknowledged that the project initially intended to have the data in hand by now, he said that the project required a "long planning phase" as well as time for preparing and mixing the samples and performing quality control.
As of last week, "all the reagents have been shipped from the vendors to the sequencing sites … and everyone's ready to go," he said. "Arguably, by the end of December, all the sites should have generated their data."
A separate group of around 20 analysis sites will then crunch the data in early 2012, "and hopefully by the middle of spring we could have some of the main manuscripts come out," Mason said.
The project is divided into three technology groups — one for each of the primary sequencing platforms — with four separate labs within each group. This arrangement will allow the researchers to evaluate how well each platform performs across different sites as well as how well each of the platforms compares to the others.
"There's one last call for each technology group to go over the standard operating procedures so that every step of every protocol is followed as closely as possible at every site," Mason said. "We hope to be generating data within the next couple of weeks."
In addition to the eleven "formal" SEQC sites, a number of labs have volunteered to run the same samples in order to contribute additional replicates for the experiment. Mason said that there are around seven additional Illumina sites, four additional SOLiD sites, and three additional 454 sites participating in the project.
And while the primary FDA-led study is focusing on the three most common commercial NGS platforms, ABRF is leading a parallel effort to evaluate the reproducibility of emerging platforms, such as the PacBio RS and Life Tech's Ion Torrent PGM, Mason said.
Over the last year, SEQC has expanded beyond its initial aim of evaluating the technical reproducibility of NGS platforms, and now includes five additional FDA-led studies looking at different aspects of sequencing technology, Mason said.
Two of these studies are looking at transcriptome annotation — one using human data and one using rat data. In these projects, "you want to sequence from all areas of the body and really develop a robust annotation of what are all the genes in the human genome," Mason said.
Another study is evaluating the ability of sequencing to predict the survival of cancer patients. The collaborators on this project, who include BGI and the University Children's Hospital of Cologne, Germany, are using samples from 500 neuroblastoma patients to develop and evaluate molecular signatures of patient outcomes.
Two other projects under the SEQC umbrella include a toxicogenomics study that is assessing the carcinogenicity of different chemicals in rats and a pharmacogenomics study that is using exome sequencing in an Amish population to predict treatment outcomes.
"In all these cases, the goal is to use, in a very robust way, of course, the next-generation sequencing methods for quantification of molecules in solution for better annotation, and then for better clinical prediction and understanding of toxico- or pharmacogenomics," Mason said.
The sequencing for these satellite projects has already begun, Mason said, though he could not comment on when the individual efforts plan to publish any data.
Mason noted that unlike the primary SEQC study, the goal of the satellite projects is not to compare platforms, but rather to "just do the sequencing and use the data." Most of the satellite projects are using the Illumina platform, he said.
Mission: Improved SOPs
The overall aim of the primary SEQC study "is to look within a platform and between platforms to see if we have good methods for molecular quantification," Mason said.
While this assessment will certainly be of value for research applications, it may be even more important as sequencing vendors are increasingly eyeing the clinical market for their instruments. "It's imperative for any clinical application to have the molecular diagnostic and sequencing methodology be extraordinarily tight and reproducible across sites and within sites," he noted.
One important goal of the project is to establish "really good standard operating procedures," Mason said.
Although acknowledging that SOPs "sound boring and dry and some technicians consider them annoying," he noted that they're crucial for the field's development because "there are so many steps in the chemistry that can change the results of the sequencing."
In particular, he noted that "there's chemical noise, which is the preparation you do, the protocol you're using," and there is also "bioinformatics noise" that arises from the data-analysis tools.
"I would actually argue now that there might even be more bioinformatics noise than there is chemistry noise," Mason said. "If you use a different aligner, or you change the parameters for your alignment, or you use a different SNP caller, you'll get different SNPs with the exact same sample. You'll get different results."
So while the primary aim of the study is to "nail down the chemical noise and tease out all the parameters that contribute to it," another goal will be to "tease out more of the bioinformatics noise," he said.
Have topics you'd like to see covered in In Sequence? Contact the editor at btoner [at] genomeweb [.] com.