NEW YORK – Using statistical and computational modeling, an international team has harnessed data from a few thousand genome sequences spanning hundreds of populations to put together a family tree that encompasses modern-day and ancient humans across populations.
As they reported in Science on Thursday, the researchers used non-parametric tree-recording methods to untangle, fill in, and interpret ancestry tract data with whole-genome sequences from contemporary human populations, along with high-quality ancient genomes that helped to put the relationships into a chronological context.
"[W]e introduce statistical and computational methods to infer such a unified genealogy of modern and ancient samples, validate the methods through a mixture of computer simulation and analysis of empirical data, and apply the methods to reveal features of human diversity and evolution," corresponding and co-senior author Gil McVean, a researcher at the University of Oxford's Big Data Institute, and colleagues wrote.
After bringing more than 3,600 modern-day whole-genome sequences from 1000 Genomes Project, the Human Genome Diversity Project, and the Simons Genome Diversity Project together with eight high-quality ancient human or archaic hominin genome sequences and nearly 3,600 additional published sequences from ancient samples, the team identified almost 27 million ancestral haplotype fragments and clues to some 231 million ancestral lineages for the 215 human populations represented in the analysis.
"[W]e use the foundational notion that the ancestral relationships of all humans who have ever lived can be described by a single genealogy or tree sequence, so named because it encodes the sequence of trees that link individuals to one another at every point in the genome," they explained. "This tree sequence of humanity is immensely complex, but estimates of the structure are a powerful means of integrating diverse datasets and gaining greater insights into human genetic diversity."
In addition to new views of relationships across human populations and between humans and archaic hominins, the team got a glimpse at past population sizes and geographic patterns, archaic admixture events, frequent human mutations, and common sequencing or genotyping errors found in human genome collections.
While the current approach relies on the availability of phased genome sequences, the authors noted, future advances are expected to make it possible to apply such genealogical analyses to still larger genome datasets.
"The unified genealogy presented in this work represents a foundation for building a comprehensive understanding of human genomic diversity, including modern and ancient samples, which enables applications ranging from improving genome interpretation to deciphering our earliest roots," they wrote. "Although much work is required to build the genealogy of everyone, the methods presented here provide a solution to this fundamental task."
In a corresponding perspective article in Science, University College London genetics researchers Jasmin Rees and Aida Andrés, who were not involved in the study, noted that the work "will undoubtedly prove useful to those studying human evolution."
"The power and resolution of tree-recording methods promise to help clarify the evolutionary history of humans and other species," they wrote. "It is likely that the most powerful ways to infer evolutionary history going forward will have their foundations firmly set in these methods."