A team of scientists from Stanford University and the California Institute of Technology has used Illumina’s sequencing platform to analyze binding sites of a transcription factor on a genome-wide scale.
Their work has been accepted for publication in Science, setting it up to be one of the first published peer-reviewed studies that involves data from Illumina’s Genetic Analyzer.
Barbara Wold, a professor of molecular biology at Caltech, presented initial results of the study at the Advances in Genome Biology and Technology conference in February. David Johnson from Rick Myers’ team at Stanford collaborated with Ali Mortazavi, a researcher in Wold’s lab, on the project. The sequencing was performed at Illumina’s facility in Hayward, Calif.
The scientists combined genome-wide chromatin immunoprecipitation with Illumina sequencing to identify the DNA binding sites of the NRSF/REST transcription factor, which is known to repress neuronal genes in non-neuronal cells as well as neuronal stem cells.
Matching approximately 3 million to 5 million 25-base-pair reads per sample to the genome, they were able to identify not only known NRSF binding sites but also a new family of binding motifs.
The data the instrument produced enabled them to locate the binding sites with “very nice” sensitivity, specificity, and precision, Wold told In Sequence in an e-mail message last week.
Also, intentionally starting with DNA of a smaller size, coupled with other platform properties, helped them to map the sites more precisely. “That high resolution is important for annotating binding positions on the genome and also for feeding experimentally determined sites into motif-finding algorithms,” she said.
“That high resolution [of Illumina’s sequencer] is important for annotating binding positions on the genome and also for feeding experimentally determined sites into motif-finding algorithms.”
The scientists also tested 454’s platform for the project, collaborating with George Weinstock and Richard Gibbs at the Baylor Genome Center. They decided to use Illumina’s technology for this study mainly because it produces more reads per run. “High read number is good in this use because each sequence read is an independent datum that helps to identify sites in the genome that have been enriched in the experiment due to the immunoprecipitation,” Wold said.
Because 454’s platform produced fewer reads per run, “the depth of sampling and the ability to define sites with weaker ChIP enrichments was not as great as in the Solexa set,” Wold said.
However, she added that using “shear brute force” — meaning more runs — on the 454 platform, researchers “would presumably achieve comparable depth.” In addition, there might be other, more subtle differences in the results the two platforms produce, owing to their different read length and informatics tools, but these “remain to be fully defined,” she said.