A new study from the Whitehead Institute indicates that a whole-genome human haplotype map may be a less daunting endeavor than originally suspected.
In a paper posted on Sciencexpress last Thursday, scientists from the Whitehead Institute/MIT Center for Genome Research and Massachusetts General Hospital discussed their characterization of haplotype structure across a broad span of the human genome.
Haplotype research is based on the assumption that most genetic variation occurs within blocks of SNPs that are inherited together. These haplotype blocks could then be tested with only a small number of SNPs, which would eliminate the time and cost required to resequence the genome and sort through thousands to millions of SNPs. However, previous studies in the field had not proven that such blocks occur across the entire genome nor that they could be identified with a small number of common markers.
According to first author Stacey Gabriel of the Whitehead Institute and colleagues, their work now provides strong evidence that “most of the human genome is contained in blocks of substantial size.” At least half of the genome is organized into haplotype blocks of 22 kb -44 kb or larger, they estimated. In addition, the authors found that common SNP markers could be used to identify haplotype blocks. They note that “fully powered haplotype association studies could ultimately require as many as 300,000-1,000,000 well-chosen [haplotype tag] SNPs.”
Most importantly, “the simplicity of haplotype patterns and their great similarity across populations suggests that a haplotype map will be both achievable and of great utility in disease-gene mapping,” according to Mark Daly, a computational biologist at the Whitehead and a co-author on the paper. Daly developed statistics-based methods “that allowed us to generally and robustly interpret the haplotype data into these blocks,” he said, adding that some of the methods his group developed could be used to construct a genome-wide human haplotype map.
Lisa Brooks, program director for the NHGRI’s Genetic Variation program, agreed that the paper’s findings are important for the HapMap project, which became a reality in March when the NIH issued a request for applications (http://grants1.nih.gov/grants/guide/rfa-files/RFA-HG-02-005.html). The Whitehead research proves that “the project is worth doing,” said Brooks — welcome news for a project that the NIH has already committed $16 million toward for 2002.
But while the paper proves the project is possible and even “puts some meat on the bones” of what the HapMap project will entail, according to Daly, many of the computational and informatics aspects of the HapMap project have yet to be defined.
Brooks, who oversees the informatics aspects of the NHGRI’s HapMap project, said that work by Daly and others has provided important tools for defining haplotype blocks, but much work still lies ahead. “There’s a tremendous need for methods of analyzing large-scale variation data,” she said. In addition, while haplotype blocks reduce the amount of data required to build the HapMap, additional techniques for “making the data more manageable” will also be necessary. Informatics tools are included in the funding for the project, but Brooks said the NHGRI is providing additional grants for informatics projects that address association studies, the analysis of large numbers of SNPs, and other related efforts.
The HapMap is expected to be complete within three years. Funding for the project will begin in late September, Brooks said.
Daly said he’s in the process of developing the methods described in the Science paper “into more robust code that others would be able to use.” In the meantime, he said, he’s willing to share his methods with anyone interested.