Next-generation sequencing is the big winner in the scaled-up Encycopledia of DNA Elements, or ENCODE, project, as more than 60 percent of all grant recipients for the $80 million initiative will use the new tools, according to grant abstracts and interviews with grant recipients.
The expanded project, which the National Human Genome Research Institute launched last month, follows on the heels of a pilot project that investigated 1 percent of the human genome (see In Sequence 6/19/2007
The new study aims to characterize all functional elements in the genome, such as DNA binding sites of proteins or DNA methylation sites.
At least 10 of the 16 new grants for the project will make use of high-throughput next-gen sequencing tools. ENCODE researchers plan to use or test different platforms, including Illumina’s Genome Analyzer, 454’s Genome Sequencer, and Applied Biosystems’ SOLiD system.
Brad Bernstein, an associate professor of pathology at Massachusetts General Hospital and the Broad Institute, won a four-year, $4.8 million grant, under which he plans to use Illumina’s Genome Analyzer to analyze chromatin regulatory elements in ChIP sequencing experiments.
Bernstein and colleagues plan to initially map 20 modifications and associated chromatin proteins in 20 different cell types, including cell lines and primary tissues, a project that will involve several hundred gigabases of sequencing.
Illumina’s platform “is quite effective for genome-wide analyses of the sorts we are doing for ENCODE,” Bernstein told In Sequence by e-mail last month. “However, we will also pilot other platforms coming online to see whether these might be more cost-effective.”
Earlier this year, the team published its ChIP sequencing method in Nature
, one of the first publications involving Illumina’s sequencing platform (see In Sequence 7/3/2007
“We are working to increase the efficiency of the protocol so it will work on even smaller cell populations,” Bernstein said.
Meantime, Greg Crawford’s group at the Duke University Institute for Genome Sciences and Policy has won a four-year, $6.5 million ENCODE grant to generate approximately 20 million sequence tags from 10 to 20 human cell types.
His team has developed two methods to identify all DNase I hypersensitive sites in the genome: one based on high-throughput sequencing and the other based on microarrays. These regions of open chromatin mark different types of regulatory elements, he said.
The sequencing-based method, which he calls DNase-seq, “works quite well on either the Illumina or 454 platform,” Crawford told In Sequence by e-mail last month. His team currently uses Illumina’s sequencer because of its high throughput.
“While some methods are better off using longer sequence reads obtained by the 454 platform, we have found that shorter Illumina read lengths [of] 20-35 bases are well suited for DNase-seq,” he said.
Crawford and his colleagues have also developed a method to identify open chromatin called formaldehyde assisted identification of regulatory elements, or FAIRE, which also involves sequencing.
Their ENCODE grant enables them to generate approximately 20 million sequence tags per DNase or FAIRE libraries from between 10 and 20 human cell types. “Additional cell types or conditions will be analyzed if sequencing costs are further reduced in the coming years,” Crawford said.
Scott Tenenbaum, an assistant professor at the University at Albany-SUNY, on the other hand, is an ENCODE veteran, having won an ENCODE technology development grant in 2004 to study binding sites of RNA-binding proteins using tiling arrays made by Affymetrix and NimbleGen.
Now, the NHGRI has awarded him a three-year, $2.2 million pilot-scale grant “to take a step back and compare my tiling array readout to some of the deep sequencing methods,” Tenenbaum told In Sequence last month. “They are looking to me ... to help them decide, ‘Are these new technologies ready to roll, or are you better off waiting a few years and then we will see?’”
Initially, Tenenbaum will compare those results with Illumina’s and 454’s sequencing platforms, working both with the companies as service providers as well as with other academic groups who own these instruments. The hope is that sequencing-based experiments would not require as many replicates as tiling arrays, which Tenenbaum said are “a relatively noisy system.” If that were the case, the total cost of doing experiments with sequencing technologies would be comparable to tiling arrays, he said.
But Tenenbaum said he is not sure whether the sequencing platforms have reached prime time, and vowed to wait if necessary. “I am not completely sold yet; I have not really seen any data,” he said.
Eventually, he believes, sequencers will take over as a discovery platform. “Unless there is some big surprise about the type of data I will be generating, I will be surprised if the sequencing methods essentially don’t prove to be the better exploratory tool.”
“They are looking to me ... to help them decide, ‘Are these new technologies ready to roll, or are you better off waiting a few years and then we will see?’”
Customized arrays may still have a place for high-resolution mapping, and Tenenbaum said he doesn’t “expect they will be completely replaced.”
Another part of the expanded ENCODE project involves technology development, and six investigators collectively received around $7.3 million in grants for this purpose last month.
One of these teams is led by Michael Dorschner, a researcher at the University of Washington in Seattle, whose three-year, $1.1 million grant will enable him to develop a sequencing-based footprinting method to detect DNA cleavage induced by dimethyl sulfate, DNase I, and UV light.
Traditionally, he has performed footprinting assays using ligation-mediated PCR, a method that can only assay one or two sites at a time. Sequencing will transform this into a counting assay that can be multiplexed, Dorschner told In Sequence last month. “Optimally, it would be nice to footprint all the active regions simultaneously for a cell type,” he said.
At the moment, he and his colleagues are using Illumina’s Genome Analyzer. “We have a couple of those in house right now that are working rather well,” he said, adding that the researchers are also planning to purchase an ABI SOLiD system.
Dorschner is not using 454’s system because it “won’t give us the number of reads that we are looking for.” The short tags of the other platforms are sufficient for his purposes, and “longer read lengths really won’t help us,” he added.
Another team is led by John Greally at Albert Einstein College of Medicine, who with Brad Bernstein and other researchers at the Broad Institute will develop high-throughput sequencing methods for mapping histone modifications and cytosine methylation.
One of their aims under their three-year, $1.5 million grant is to push the number of cells required for ChIP sequencing below a million cells, Greally told In Sequence last month.
Another aspect of the project, in collaboration with Alex Meissner at the Whitehead Institute (see In Sequence 11/20/2007
), aims to study DNA methylation, which Greally said requires much greater depth of coverage than ChIP-seq experiments.
Sequencing the entire methylated DNA is therefore still “an enormously expensive project,” he said, with estimates running at $750,000 per cell type in consumables using Illumina’s sequencing technology. “So the strategy that you have to use in the short term, until we get even better sequencing technologies, is to reduce the representation of the genome down to something in which we are interested,” he said.
Greally’s group is also interested in studying unmethylated DNA, which “seems to mark functionally interesting sites,” he said, and has proposed a strategy for enriching unmethylated DNA using certain DNA-binding proteins.
Initially, he and his colleagues believed that they would need 454’s longer reads to map bisulfite-converted DNA accurately back to the genome, but that is no longer true. “What we have realized is that the problem is bioinformatically soluble,” he said. “The depth of coverage offered by [Illumina’s] Solexa is very attractive and is making it a preferred platform for what we want to do.”
Greally added that ABI’s SOLiD platform is “intriguing as well for some of the projects we are thinking of” that require paired-end reads, which ABI has been implementing from the outset, “so we are certainly interested in exploring that as well.”
Albert Einstein plans to acquire “at least one” next-generation sequencer as part of a new epigenomics infrastructure, Greally said. “It’s on our shortlist of things to do right now, and certainly the collaboration with the Broad is going to be great in terms of putting us on the right path for setting up the technology effectively.”
Finally, Yijun Ruan’s group at the Genome Institute of Singapore will use his three-year, $990,000 ENCODE technology award to develop a method to study long-range chromatin interactions. High-throughput sequencing will play a crucial role in this method, called chromatin interaction analysis using paired end ditagging, or ChIA-PET, Ruan told In Sequence by e-mail last month. The grant follows an earlier ENCODE technology award he won in 2004 to develop ditag technologies for transcriptome annotation.
Because the noise in the chromatin analysis is high, “the conventional sequencing method is not practical,” Ruan said. “The tag-based next-generation sequencing platforms are perfect for this application” and “the only affordable approach, as a matter of fact.”
His group started out with 454’s sequencer and is now using Illumina’s. It is also getting an ABI SOLiD sequencer. “We will jump to whatever new sequencing instrument [provides] more throughput, high speed, and low cost,” Ruan said.
But generating the sequence data is probably the easiest part of the project. What will be more difficult is to construct the libraries, and analyze the data, according to Ruan. “There is a set of entirely new challenges in bioinformatics to deal with this kind of data,” he explained. “We are trying to develop an entire pipeline from library construction to data delivery and visualization.”