The Ontario Genomics Institute is funding a pair of studies by independent research teams intended to optimize the performance of common sequencing applications. One project is aimed at comparing commercially available kits used to capture sequences for exome sequencing and the other is focused on improving the protocol for preparing libraries during chromatin immunoprecipitation-sequencing studies.
Researchers from the Centre for Applied Genomics, or TCAG, at Toronto's Hospital for Sick Kids and from the Ottawa Hospital Research Institute's StemCore Laboratories, a Genome Canada Science and Technology Innovation Centre and OGI platform affiliate in Ottawa, each received technology seeding grants valued at C$10,000 ($9,800) to support their respective projects.
OGI announced the grants in late December.
The TCAG team plans to compare sequence capture kits offered by three firms, while investigators at StemCore Laboratories are working on ways to improve the library preparation protocol used during ChIP-seq experiments.
The exome capture comparison, already underway, will compare the Illumina TruSeq Exome Enrichment Kit, the TargetSeq in-solution target enrichment kit from Life Tech/Applied Biosystems, and version 4 of Agilent's SureSelect target enrichment kit.
By pitting the Illumina, Life Tech, and Agilent kits against one another, researchers at TCAG hope to gain a better understanding of how each kit performs — information that may help not only for improving exome sequencing protocols, but also for providing information to customers about which platform best suits their research needs.
The center offers a range of next-generation sequencing services to customers from around the world, including genome and exome sequencing as well as genotyping, cytogenetics, and more, explained TCAG researcher Sergio Pereira, who is heading the exome capture kit comparison project at the center's next-generation sequencing lab.
"As a service provider, we often evaluate different options that are out there on the market, or if any of the vendors are willing to work with us and improve on a given protocol, we are also open to that," Pereira told In Sequence.
"The idea of this comparison is that we have to establish some guidelines — which is the best method depending on what people want to do," he added. "Since the contents of these kits are different, if some people are interested in particular genes, we can go and see if that gene is part of the design or not. That will decide which kit they should be using depending on which targets they have in mind."
The team will not test capture kits from Roche's NimbleGen. Pereira said NimbleGen did not express interest in participating in the study and noted that TCAG does not have all of the equipment needed in house to use NimbleGen's exome capture kit at this time.
The researchers had initially planned to limit their comparison to the Illumina and Life Tech capture kits, but decided to include Agilent's SureSelect v4 system as well, using a demo kit provided by the company.
During the comparison, researchers will capture sequences from at least two and perhaps as many as four samples using each of the three kits before sequencing the resulting libraries.
They plan to use the Illumina HiSeq to sequence samples prepared using the Illumina TruSeq kit. The SOLiD 5500 platform will be used to sequence samples prepared using the Life Tech and Agilent kits.
Pilot studies that the group started last year suggest that the Illumina GAII and SOLiD 4 performed comparably for sequencing around 150,000 bases of DNA targeted using a custom exome capture kit, Pereira noted.
"Both the GAII and the SOLiD 4 performed equally well in terms of giving raw data and in terms of the amount of data that we got," he said.
So far the two sequencing platforms also appear to perform equally well in terms of analysis, Pereira said. Still, he noted that it is important to be cautious when analyzing SOLiD data since most existing analysis pipelines are optimized for Illumina sequence.
"You just have to make sure you're using the right parameters for the SOLiD color space data to get similar results as the Illumina set," he explained.
In some cases, researchers will be able to compare exomes directly to matched whole-genome sequences for the same samples that were previously generated by Sanger sequencing or on the Complete Genomics platform, allowing for even more comprehensive comparisons of exome capture kits.
"We will be mapping the exome data against [matched] genomes and that will serve as a very good process to estimate the sequencing error rates of these instruments," Pereira said.
Based on the sequences obtained, he and his colleagues plan to compare a range of performance measures for the kits, including sensitivity, specificity, and uniformity of coverage. In particular, Pereira said the team will look at whether some of the newer kit designs address GC bias and related coverage issues associated with some early versions of sequence capture kits.
Because there are differences in the amount of sequence targeted by each kit, the team will focus much of its analyses on sequences targeted by all three capture kits.
"Since these kits differ in design, we're only going to be comparing the exomes that actually overlap between those kits," Pereira said. "So basically what we will be looking at will be the coverage for the overlapping regions and the uniformity of the coverage."
While the new OGI grant will not cover all of the costs associated with the study, Pereira said it should help in purchasing reagents and any kits that have not already been provided. TCAG will foot the bill for the bioinformatics portion of the project, he noted.
Companies participating in the comparative project are offering a break on the cost of kits or related sequencing reagents, he added, either by providing free kits or offering deep discounts on both kit and reagent prices.
Pereira said the team expects to have all of the samples sequenced by mid-February and hopes to complete the comparison by around March.
"As soon as we actually have some information about this we will start providing guidance to [customers] based on these results," he said.
Tackling Clonal Duplicates
At Ottawa Hospital Research Institute, meanwhile, StemCore Laboratories Director Pearl Campbell and her colleagues are tackling another sequence preparation problem: the clonal duplicates that turn up in ChIP-seq experiments.
The goal of the project "is really to assess what is the source of clonal duplicates — are they real or are they not real," Campbell told IS.
"Because of the fragmentation of the chromatin before you perform the ChIP, you expect that to be random," she said. "If you get multiple reads that are identical, you could say that something may have gone wrong with the library construction: either there are a lot of PCR artifacts or something isn't optimal."
Duplicate reads are generally tossed out during analysis, though it's not always clear whether these reads — which comprise between 10 percent and 50 percent of the reads in typical ChIP-seq experiments — are authentic or merely experimental artifacts.
Because so many reads associated with clonal duplications are lost, the team is keen to determine their source and to come up with the optimal ChIP-seq protocol for obtaining as much data as possible, Campbell explained.
"That's a lot of data to just be throwing out without really validating," she said. "Should we be throwing it out? What are the implications? Can we do this better so that we're not losing all of these reads?"
To begin answering such questions, the researchers plan to do a series of ChIP-seq experiments using a three-read approach, comparing the reads they get by paired-end sequencing, single-end read sequencing, and paired-end sequencing in conjunction with digital tagging using a kit developed by the Texas biotech company Bioo Scientific.
"We'll be using [digital tagging] in a little bit of a non-standard way initially," Campbell said. "We're going to actually be taking these index tags and applying them to the same sample to get an idea of whether the clonal duplicates are present or not."
The study will be done using embryonic stem cells — a cell type that Campbell and her team have worked with before. By focusing on the ChIP-seq patterns associated with the pluripotency-related stem cell transcription factor called Oct4, for instance, the researchers hope to learn more about the ChIP-seq protocol itself.
"If we use what is already known about Oct4 and compare the data that we're getting out of the back end from our new method, then we'll be able to validate it," she explained. "It's a sanity check on the data."
Experiments at StemCore will all be performed on the the Illumina GAIIx platform, Campbell said, though a team of collaborators in the Netherlands led by Wilfred van IJcken at the University of Erasmus will also do sequencing for the project using the Illumina HiSeq 2000.
Although it will not cover all the costs for the ChIP-seq study, the new grant from OGI should cover the cost of reagents and consumables at StemCore, Campbell said. The University of Erasmus team will foot the bill for HiSeq 2000 sequencing costs, she noted, and bioinformatics work for the project will be done as part of the International Regulome Consortium.
Researchers hope to have all of their experiments and analyses complete by around September of next year. And though their own experiments are being done on Illumina platforms, the ChIP-seq protocol they settle on once the study is done should be compatible with virtually any of the sequencing platforms available.
"Once the protocol is established, it could easily be implemented on other platforms, of course taking into account their requirements for library construction," she noted.
The team will likely make the findings publicly available for use by other teams prior to publication, Campbell said. "Once we are certain that the protocol is robust and up and running, we will post it to our website and make it available for any facility that would be interested in using it."
Have topics you'd like to see covered in In Sequence? Contact the editor at anderson [at] genomeweb [.] com.