The 1000 Genomes Project, launched in January to create a new, more detailed map of the human genome, will rely exclusively on second-generation sequencing technology in its goal to sequence at least 1,000 human genomes.
But participants and microarray vendors have recently begun saying that the initiative could lead to a new generation of arrays for whole-genome association studies and targeted genomic research, perhaps quieting critics who claim that the project’s reliance on second-generation sequencers heralds the end of such array-based studies.
“What we are going to learn about genetic variation from this project is going to be startling, and so we will understand much, much more about genetic rearrangement, about insertions, deletions, and transformations in the genome,” Illumina CEO Jay Flatley told investors last week at the Goldman Sachs Annual Healthcare Conference.
“We expect this to fuel a whole new round of chip development, and as we learn more about the rare variation, then we will be able to put those components on our chips,” he said.
Flatley made his comments as his company joined Applied Biosystems and Roche subsidiary 454 Life Sciences in announcing that they would help the effort, considered a follow-on to the International Human HapMap Project, which ended in 2005.
Illumina, 454, and ABI, currently competitors in the second-generation sequencing market, will participate in the pilot phase of the project, which is projected to generate at least 6,000 gigabases of data over three years. Each of the vendors will sequence the equivalent of 75 billion DNA bases, or the equivalent of 25 human genomes, over the coming year. The companies also have pledged to contribute additional sequence data over a three-year timeline.
Illumina has relied on the data generated by the HapMap project, plus content from other sources, to design its current generation of arrays for genome-wide association studies. In January 2006, the company launched its HumanHap300 Beadchip, which contains roughly 300,000 tag-SNPs from the HapMap catalog, and it continues to release arrays that include content from the project, such as its Human 610-Quad Beadchip, launched this spring (see BAN 1/17/2006, BAN 1/8/2008).
According to Flatley, the 1000 Genomes Project is likely to produce a more detailed map of genetic variation that will complement the variation already documented by the HapMap project.
“The HapMap project only had the ability, because of the limited number of samples, to look at variation” at about the 5-percent level of variation in the genome, he said. “The idea of the 1000 Genomes Project is to get to [the] under-1-percent [level of variation], and in the genes to get down in the range of about .5 percent,” he said.
Flatley added that Illumina believes the availability of new chips will drive a new boom in genome-wide association studies. “The people who thought this was a two- to three-year market are being proven wrong and we are seeing a resurgence of genome-wide association studies,” he said.
“As we discover more and more information about the genome, we will be able to improve the content on chips. We think there is going to be a whole new wave of genome-wide association studies,” said Flatley.
Broad Institute researcher Steve McCarroll, who is a member of the 1000 Genomes Project’s analysis group, told BioArray News this week that the exclusive use of second-gen sequencers in the project has led some to falsely believe that the end is nigh for array-based genome-wide association studies. According to McCarroll, that notion is wrong for a number of reasons.
“There is a common misconception that 1000 Genomes marks a transition from array-based to sequencing-based approaches in genome-wide association studies,” McCarroll said. “In fact, the 1000 Genomes data is likely to make array-based approaches far more powerful than they are today; this is important, because array-based approaches are still hundreds of times less expensive per patient than whole-genome sequencing is,” he said.
“The 1000 Genomes data is likely to make array-based approaches far more powerful than they are today.”
McCarroll pointed out that the current generation of SNP arrays typically only captures common polymorphisms and haplotypes in a population. He said that clinical communities have meantime mobilized to collect DNA from “enormous patient cohorts” that should make it possible to test disease association of low-frequency as well as common variants.
“We'll need a new generation of SNP arrays to type those low-frequency variants,” McCarroll noted.
He added that the combination of microarrays, statistical imputation techniques, and data from the 1000 Genomes Project will be “enormously powerful.” According to McCarroll, “one will be able to perform a microarray experiment on an individual, typing a million or more common SNPs, and then to impute most of the rest of that individual's genome sequence as a mosaic of the haplotypes that the 1000 Genomes Project has fully sequenced.
“Such approaches are likely to be able to capture all but the lowest-frequency variants that are segregating in a population,” he said.
Companies and scientists involved in the genome-wide association studies arena are not the only ones enthused about the 1000 Genomes Project for this reason. Agilent Technologies, which sells arrays for applications such as expression, comparative genomic hybridization, and DNA methylation profiling, believes it will reap some benefits from the project.
“We believe as more sequence information enters the public domain, this will result in the need for more focused studies,” said Yvonne Linney, vice president and general manager of Agilent’s genomics business. “Many of these studies will be done using microarray approaches, primarily due to the need to focus on subsets of the genome, specific regions, or known mutations or biomarkers that are significant or thought to be causative in a given disease state,” she said.
Linney told BioArray News this week that the large sample sets common to such studies will likely drive customers to adopt arrays as a research platform due to their comparatively lower cost relative to next-generation sequencing. Like Flatley, she said that the “substantial amount of data” generated from the 1000 Genomes Project will “result in more information that can be used to design the next generation of microarrays.”
Arrays will not be the only segment of Agilent’s business to benefit from the project. Linney noted that last month Agilent licensed a method developed at the Broad for genome partitioning that uses Agilent’s Oligo Library Synthesis technology. According to Agilent, the method enables users to design and acquire ready-to-use, custom mixtures of biotinylated, long RNA probes in a single tube.
“Without accompanying improvements in the ability to select relevant portions of the genome, [next-generation sequencing] technology cannot achieve its full potential in studying the relationships between genes and diseases,” said Linney. “We see our genome-partitioning portfolio, which is currently in development, holding great potential for eliminating this bottleneck.”
Affy’s Internal Project
It is unclear what other larger array companies think about the 1000 Genomes Project. Affymetrix declined to comment for this article, while Roche NimbleGen was unable to answer questions in time for this publication.
In the case of Affymetrix, though, the firm had previously announced that it is working on an internally generated database of human genomic variants that it will use to build its next generation of chips for genome-wide association studies.
In January, CEO Stephen Fodor said the company’s current generation of arrays is based on the HapMap database. Now, he said, Affy has taken the “entire known human variation database — about 12.5 million variations” — and created a set of chips that will interrogate an additional 1,100 samples in order to generate a larger database (see BAN 1/15/2008).
Fodor said at the time that the new database will be ready during the first half of this year and will be used as a “resource to design our next-generation products” that will be “able to look at up to 10 million assays per chip in the near future.” Affy’s new extended variation map will also be provided to customers with these products when it becomes available, Fodor said.
Affy has not provided any recent update on the development of the database. During a first-quarter earnings call with investors, Fodor said that the company was likely to complete the database on schedule (see BAN 4/29/2008).