A research team led by investigators at the Universities of Oxford and Chicago has shown that it is feasible to do fine-scale genetic mapping using high-throughput population sequence data.
The group used this population-based approach to assess recombination patterns in the chimpanzee genome — work that they reported in the early, online edition of Science last week. Those involved in the study say similar methodology should allow for detailed recombination rate and hotspot mapping in any species with a sufficiently high-quality reference genome.
"[W]e set out to develop approaches based on sequence data, which, if successful, potentially open up the possibility of producing genetic maps for many species," the study's authors explained.
So far, recombination mapping has been limited to a small group of species — primarily humans and well-studied model organisms such as mice and budding yeast.
In part, that is because researchers have relied on tracing the transmission of specific genetic markers within a known pedigree, or doing detailed studies of meiotic cells to track down and characterize sites of recombination in the genome.
Molecular genomic approaches such as chromatin immunoprecipitation sequencing have made it possible to look directly at the double-strand break events and recombination machinery assembly patterns during meiosis in yeast and, in some cases, using meiotic tissue from mice, explained Gil McVean, a University of Oxford and Wellcome Trust Centre for Human Genetics researcher and co-corresponding author on the chimpanzee recombination mapping study.
But such experiments are difficult or impossible to do for some species, McVean told In Sequence. "If you want to learn about recombination in most species, you can't use high-throughput genomic molecular methods and you can't do the pedigrees. Basically, the only thing you can hope to do is to study genomic variation."
In humans, researchers have had a good deal of success using variation within natural populations to learn about the recombination patterns in the human genome, he explained.
In 2004, for example, McVean and collaborators from the University of Oxford and Wellcome Trust Sanger Institute used genetic variation patterns in European and African populations to show that recombination hotspots were present across the human genome.
A few years after that, members of the international HapMap Consortium, including McVean, used genotyping data generated for phase II of that study to do more extensive analyses of haplotype, linkage disequilibrium, and recombination patterns in the humans.
Until the advent of large-scale sequencing studies such as the 1000 Genomes Project, though, researchers have relied largely on array-based methods to look at genetic and genomic patterns within the human genome. For species that are less well-characterized genetically, researchers explained, routinely assessing variation within populations is often difficult.
"Methods for estimating recombination rates from SNP data have been validated at both broad- and fine-scale scales," the authors of the new Science study wrote, "but there remains a gap for species without SNP arrays," which includes most species.
For a long time, population sequencing for the purpose of finding genetic variants for genetic mapping efforts was not considered especially cost effective given the high price tag for genome sequencing, McVean explained.
There was also some question about how feasible it would be to actually distinguish between genuine genetic variants and sequencing errors in such studies.
Even members of the 1000 Genomes Project, who are able to compare their findings with the relatively well-characterized human reference genome, are "grappling on a daily basis with the complications of calling genetic variants from high-throughput sequencing technologies," said McVean, who is a co-chair for the 1000 Genome Project analysis group.
For the chimpanzee and other species with reference genomes that are lower quality than the human genome, sorting authentic variants from sequence blunders is even more daunting.
"When you go to chimpanzee, which has a much poorer reference sequence — much lower coverage used to assemble it and it wasn't assembled using BAC sequences — there was a real worry that we'd just end up with a load of noise and no signal at all," McVean explained.
"In fact, that's what happened when we first tried it," he added. "The first results we got when we found our genetic variants and estimated recombination were just a mess."
The researchers used the Illumina GAII to do paired-end, whole-genome sequencing on nine female chimpanzees and one male chimpanzee from the western chimp population, Pan troglodytes verus.
They then compared this genome sequence data to the chimpanzee reference genome, which was completed around seven years ago, to infer genetic variants and haplotype patterns from the population sequence data, using methods similar to those employed for the 1000 Genomes Project.
To clean up their data, the team developed a filtering scheme for finding and removing false-positive SNPs that might otherwise muddle the resulting recombination map.
"Initial maps estimated from variation data using existing methods were dominated by large and artefactual increases in genetic distance caused by false positive SNP calls, often in large repeats that are systematically under-represented in the chimpanzee reference genome," the researchers explained in their paper. "Most of these SNPs do not fail standard filters, hence we developed regional filtering strategies."
Since the type of error corrected by the regional filtering method developed for the chimpanzee study seems to be quite generic between species, a similar strategy should also be applicable for population-based genetic mapping in other species, McVean said. He noted that researchers are in the process of coming up with slightly more sophisticated versions of the statistical methods.
"The idea is that you search for the signals of poorly behaved variants," he said. "Simply looking at patterns of haplotype structure and recombination can be very informative and that helps you weed out good from bad variants."
Some of these errors are a consequence of the particular sequencing platforms used for the study, since each has a slightly different error profile.
McVean said the population sequence-based genetic mapping method is expected to be compatible with virtually any of the existing sequencing technologies available, but noted that Illumina "has a nice, simple, predictable error structure" that's well suited to the type of analysis used in the study.
To further validate the population-based genetic mapping method used for the chimpanzee study, the researchers used the same methodology to assess human population sequence data from the 1000 Genomes Project.
In particular, because recombination has been better studied in humans, they were able to look at how recombination predictions provided by population sequence from 10 individuals of European ancestry and 10 individuals of African ancestry corresponded to what was already known about recombination in the human genome, demonstrating that the results were quite compatible overall.
Once they had a set of SNPs identified with a genotyping accuracy of around 97 percent, the researchers put together a chimpanzee genetic map based on haplotype and SNP data and used linkage disequilibrium information to come up with recombination rate estimates using approaches similar to those used to determine human recombination from microarray-based SNP data.
The overall patterns found in the chimpanzee genetic map were further verified by generating a "robust" recombination map based only on the highest quality SNPs identified from the chimpanzee sequence data.
With the chimpanzee genetic map in hand, the team was able to do a series of comparisons between humans and chimpanzees, looking at everything from recombination rates and hotspots to sequence features associated with recombination in each species.
"What this study has told us is that the comparative analysis of recombination in related species is very powerful," McVean said. "It can pick up modes of evolution. It can actually tell you surprising things about function and the actual molecular machinery that drives recombination."
With that in mind, the team hopes to apply the same population-based method to do fine-scale genetic mapping and recombination profiling in other species as well.
For the chimp study, researchers sequenced each of the 10 chimpanzee genomes to an average of between nine- and 10-fold coverage, which McVean called a "very good level of coverage if you want to maximize the number of samples." The estimated sequencing cost was on the order of $5,000 per individual, though he said the price would likely be considerably less now.
"What we're interested in is not so much the individual genomes that each of these chimpanzees carry — and we're certainly not interested in the mutations that are found in only one of these chimpanzees," he explained. "We care about the structure of common variation, the common haplotypes, because [that's what] tells us about the historical recombination and ultimately allows us to build this fine-scale genetic map."
The group is aiming for more sequencing depth and a larger sample size for their current effort to do genetic mapping based on genome sequence data for 20 mice sequenced to 20-fold coverage. It also plans to apply the population sequencing approach to study recombination in the zebra finch.
Generally speaking, McVean said the same methodology used in the chimpanzee genome mapping study should be feasible for any species with a sufficiently high quality reference genome.
Still, he cautioned that it may be necessary to take a second look at the reference genomes that are available for some species to be sure that they are of high enough quality to do the sequence comparisons needed to discover authentic genetic variants in a population.
"I do strongly believe that we need to revisit the question of how you make reference genomes," McVean said. "While the reference genomes we have might be a very good starting point for looking at things like conservation — in terms of looking at genetic variation and the structure of genetic variation, they fall a long way short of what's ideal."
Have topics you'd like to see covered in In Sequence? Contact the editor at anderson [at] genomeweb [.] com.