This article was originally published Dec. 30.
As we shuttle between airports or set off on New Year's cruises to the tropics, it's easy to take long-distance travel for granted, to dismiss the long, arduous journeys our ancient predecessors made across and between continents.
But those migrations did more than disperse humans across the planet. They also helped shape subtle between-population differences and diversity in the human genome, as new locales presented distinct collections of environmental conditions, pathogens, and sometimes even other populations to mingle with.
Over time, such exposures contributed to a dynamic collage of genetic variation in our genomes, reflecting circumstances our ancestors had to deal with in different parts of the globe — enhanced immunity to a pesky pathogen here, changes in digestive enzymes there.
For many years, scientists set on understanding such adaptations and the migrations behind them had few easy-to-wield genetic tools at their disposal. That has changed dramatically in the span of a decade-and-a-half with the introduction of affordable array and sequencing technologies.
Today, researchers can tap into reams of genetic variant and/or sequence data generated for populations around the globe through large consortium projects. And as sequencing speeds increase and prices continue to dip, many can start considering their own sequencing-based studies of everything from human demography to the functional effects of variants that differ in frequency from one population to the next, including those with medical relevance.
"The field, for a long time, was largely theoretical," David Altshuler, a genetics researcher at Harvard Medical School and director of the Broad Institute's Program in Medical and Population Genetics, told In Sequence. "It wasn't possible to start by measuring variation in people's genomes."
When such experiments did take place, they were initially limited to distinguishing between very specific sites in the genome — alternative enzyme isoforms, perhaps, or small sets of genes, he noted.
Eventually, though, it became possible to look at SNP and even sequence patterns on a genome-wide level. The advent of microarrays made it possible to easily measure variation at tens or even hundreds of thousands of sites in any given human genome, allowing investigators to see genome-wide SNP, copy number variation, and recombination patterns in many individuals from a population.
"With genome-wide data, we can construct models for what sorts of events we think might have happened in the past," Max Planck Institute for Evolutionary Anthropology researcher Mark Stoneking told IS. "That would include population divergence times, changes in population size, migrations, or admixture events."
Indeed, the availability of genome-wide approaches has led to a slew of genomic diversity studies that have largely supported but also refined the broad themes predicted by prior models, Altshuler noted.
On the human migration side, genotyping data on present-day populations have provided support for the out-of-Africa migration hypothesis and helped researchers more fully delineate other human dispersal routes. For example, admixture and linkage disequilibrium patterns from at least one study hint at the possibility of a back-to-Africa migration by members of an ancestral Italian population.
Such approaches have also been applied to retracing relationships between populations in one or more geographical locales.
With array-based SNP profiles in hand, some teams have focused on untangling ancestry and gene flow into existing European populations, as well as the resulting genetic similarities and differences between neighboring groups.
Others have peeked at genotyping profiles for hundreds or thousands of individuals from other parts of the world, including India and Africa, to look at how well genetic data jibe with known cultural diversity, social structures, and language groups in a region.
Researchers have gotten a hand in understanding those long-ago human migrations from some unexpected sources too, including the Neanderthal genome.
Long maligned in pop culture depictions as primitive and oafish for its protruding brow and cave-living, the archaic hominin apparently held more appeal for our modern human ancestors: a Neanderthal genome sequencing study published in Science in 2010 provided the first firm evidence of inter-breeding between modern humans and Neanderthals.
Beyond offering a peek into ancient hominins' romantic past, that Neanderthal and human mixing left sequences in the genomes of many modern-day humans that have been used by researchers to support and refine the out-of Africa migration hypothesis for humans.
Because the proportion of Neanderthal ancestry is similar in virtually all human populations living outside of Africa, researchers suspect that modern humans may have encountered and intermingled with Neanderthals early in the out-of-Africa migration, perhaps in North Africa or the Middle East.
"You see more or less the same signal of Neandertal admixture in pretty much all non-Africans," Stoneking said. "So that suggests that there was one major migration of humans out of Africa that then admixed [with Neandertals]."
Even more human migration information came from the Denisovan genome, first reported by investigators at the Max Planck Institute for Evolutionary Anthropology, Harvard Medical School, and elsewhere in 2010 and published in its improved form last year.
Those studies suggested that another archaic hominin besides the Neanderthal shared intimate moments with our modern human ancestors. Rather than appearing in equal proportions in the genomes of individuals in many locations, though, Denisovan sequences have primarily been detected in populations in and around Oceania — turning up in Papua New Guinea and the Philippines, and in aboriginal Australian populations in a manner consistent with a proposed southern route and multiple stages of human dispersal to that region.
"With the genome-wide data, as well as the signals that we get from the archaic admixture, the answer that we're getting is coming pretty firmly down on the side of an early southern route of migration," explained Stoneking, who was senior author on an American Journal of Human Genetics study using Denisovan DNA and present-day population genotypes to retrace those events.
Not surprisingly, preserved human remains are providing yet another source of information on historical human habits — from the 5,300-year-old Tyrolean Iceman, Ötzi, whose SNP and sequence data provided clues to the spread of agriculture in Europe, to an Aboriginal Australian genome that a University of California-led team used to garner their own evidence for a southern migration into Asia and Australia.
The same rapid, high-throughput sequencing technologies used to tackle those individual genomes also proved useful for tallying up diversity and variant profiles in human genomes at large through efforts such as the 1000 Genomes Project.
On the heels of array-based efforts, such as the SNP consortium and International HapMap Project, the 1000 Genomes effort began with the (at-the-time very lofty) goal of collecting and sequencing samples from 1,000 individuals from human populations across a broad geographical range and with varied ancestral backgrounds.
By its end, the 1000 Genomes Project will have profiled roughly 2,500 samples, either through low-pass whole genome sequencing, high-coverage genome sequencing, and/or exome sequencing. The sample set represents five groups of 100 individuals apiece from five parts of the world: Europe, East Asia, South Asia, Africa, and the Americas.
Along with the value of the sample collection itself for future research, Altshuler noted that sequence data generated so far has made it possible to catalog some 99 percent or more of the variable positions present in any newly sequenced genome.
The proportion of characterized variants in coding portions of the genome is even higher, given the widespread use of whole-exome sequencing by other consortia and independent research teams.
All that data comes in handy for not only getting a sense of the genetic diversity and variation within or between populations, but also for interpreting other kinds of data such as variant patterns in association studies that are focused on a specific trait or disease.
As recently as a decade ago, researchers would have been "ecstatic" at the prospect of getting data on 10,000 variants at the population level, Stoneking noted. These days, though, that amount of information is relatively miniscule compared to the number of polymorphisms typically profiled in large sample sets.
"We're in this era of massive genome-wide data that we can reasonably, quickly, and inexpensively generate from populations of interest — and rapidly getting into the era of complete genome sequence data from populations of individuals of interest."
"That, in turn, has made it possible to do many new sorts of analyses that simply weren't possible before," he added.
Indeed, several research teams have made forays into sequencing multiple representatives from specific human populations. Among them: a University of Pennsylvania-led team reporting in Cell in 2012 that used whole-genome sequence data on 15 individuals from Western Pygmy populations in Cameroon and click-speaking populations from Tanzania to look for population-specific adaptations.
"The technology has truly revolutionized the field, in terms of our ability to do things that I never thought would have been possible five years ago," the University of Pennsylvania's Sarah Tishkoff, senior author on the study, told IS.
"I didn't think we'd be able to sequence the entire genomes of 15 diverse Africans at high coverage — that literally was impossible five years ago or so," said Tishkoff, who has led several large studies on genetic diversity in African populations.
Still, Tishkoff cautioned that while genome sequences can be valuable for inferring population histories, relationships, and adaptations, it can also be quite difficult to interpret data and find important functional variants with whole-genome sequence data. That is particularly true in cases where genomes are not sequenced to sufficient coverage or when there are relatively few samples available, she added.
"The problem with some of the newer methods for looking between populations is that you need phased data," she said. "And getting phased whole-genome sequence data when you don't have giant sample sizes is problematic, in my opinion."
Nevertheless, sequencing may be poised to make other types of contributions to the field as well, as researchers consider the layers of information required to interpret genome sequence and gene expression differences within and between populations, along with their regulation and relevance.
For example, it may be well and good to know that the genomes of individuals from African pygmy populations tend to show signs of selection in regions involving height-, immune-, and hormone-related genes — patterns that Tishkoff and her colleagues reported in PLOS Genetics in 2012.
But untangling the biological basis of such potential adaptations, finding their functional effects, and understanding how they influence an individual's traits, risk of disease, and/or drug response remains a tricky prospect.
On that front, Tishkoff emphasized the importance of applying systems biology approaches to understand population-specific adaptations — from regulatory variants and their effects on gene expression and protein patterns to epigenetic profiles that vary in concert with changing environmental exposures.
"Now, we can integrate metabolomics data, genomic data, transcriptomic data, epigenomic data, environmental data, and proteomic data … using a systems approach to try to identify pathways or naturally occurring genetic variants that — as they say in systems talk — perturb these pathways," she said.
She and her team have already starting doing some epigenetic, metabolomic, and proteomic studies using samples from some of the same diverse African populations they've previously profiled using other approaches. They are particularly interested in integrating multiple data types for individuals from populations with distinct diets and/or environmental influences.
Tishkoff noted that she is also interested in working with members of ENCODE and the National Institutes of Health's Roadmap Epigenomics project in the hopes of exploring regulatory and other forms of variation in populations from Africa that have not been included in similar studies so far.
For his part, Stoneking noted that microbial studies may offer yet another source of information on human demography and adaptations. His own team is involved in studies of populations from southern Africa and from the region in and around Oceania and Southeast Asia.
"There are all sorts of other aspects of variation in modern humans beyond just DNA that one can look at and try to make use of, in terms of both investigating population history as well as trying to figure out important adaptive changes that enabled us to be more successful than any other species on this planet," Stoneking said.
In expanding sequencing into systems biology and functional studies, though, it will likely be important to consider ways of studying the appropriate tissue type, which often cannot be obtained directly from volunteers.
Tishkoff noted that she and her team have identified a candidate gene suspected of contributing to short stature in African pygmy populations, for example, and would ideally like to do further functional studies in pituitary gland cell lines from an appropriate population, should that become possible.
While the prospects of acquiring such samples may be somewhat dim, Tishkoff expressed enthusiasm about the possibility of doing studies of various tissue types generated from induced pluripotent stem cells.
"What is going to revolutionize the field, I think, is being able to generate [induced pluripotent stem] cells because, in theory, we can create different tissue types and then do functional studies looking at gene regulation," she said.