NEW YORK – A collection of studies around 240 mammalian genomes is offering a look at everything from human disease-related regulatory features to the risk of extinction in different animal species, contributions of coding and noncoding sequence to evolution, and phenotypes found in the famed sled dog Balto.
"We're using new genomic technology to discover how the genome works," Elinor Karlsson, a researcher affiliated with the University of Massachusetts Chan Medical School and the Broad Institute, said during a press briefing on the project this week. "We're taking advantage of the fact that there's massive biodiversity on this planet to actually understand ourselves and make new discoveries that are relevant to treating human diseases — and at the same time, as we're doing this work, these species are at risk of going extinct, and some of them are going extinct."
Across 11 new studies, published in Science and related journals on Thursday, members of the Zoonomia Consortium presented insights gained through genome sequencing, primarily done using short-reads, along with sequence alignment and other computational analyses on 240 placental mammal species, including dozens of endangered species and other mammalian representatives peppered across branches of the mammalian family tree.
"We're trying to figure out, in every single species — for each position in their DNA, in their genome — which position doesn't match back in the ancestor of all mammals, and how has it changed since then," Karlsson explained.
In the process, the team highlighted genomic regions with unusual conservation, neutral mutation patterns, or accelerated mutation within specific species, providing an avenue for interpreting variants involved in human conditions such as cancer as well as adaptation-related alterations.
The resulting 240-species alignment "represents only about 4 percent of all mammalian species, so we're still only looking at a tiny portion of mammals," Karlsson explained "but it is the largest project we've ever done like this."
For one of these studies, researchers at the University of Massachusetts, the Broad Institute, and other centers in the US and Sweden tapped into the Zoonomia reference-free alignment and a phyloP evolutionary conservation score to assess evolutionary constraint at almost 1 million candidate cis-regulatory element (cCREs) sites and some 15.6 million transcription factor binding sites (TFBSs).
That team's analyses highlighted nearly 439,500 evolutionarily constrained cCREs and more than 2 million constrained TFBSs, making it possible to distinguish between elements involved in general mammalian processes and those contributing to more primate-specific ones.
"[W]e charted the evolutionary landscapes of cCREs and TFBSs among Zoonomia's 241 placental mammalian genomes and identified a subset of elements under purifying selection in the mammalian lineage," the authors reported, noting that these elements "are highly enriched in the human genetic variants associated with a panel of diverse, complex traits" and "should help efforts to define the functional impact of human variations."
Similarly, additional Zoonomia papers relied on deep learning methods and chromatin capture experiments to find regulatory changes in parts of the genome with accelerated change in humans or chimpanzees, and profiled human-specific deletions in parts of the genome containing regulatory elements that are conserved in other mammals.
For their part, researchers at Uppsala and other international centers used a Zoonomia genome alignment to find constrained sites that have been prone to purifying selection over evolutionary time, including sites encompassing nearly 11 percent of the human genome, while flagging more than 4,500 ultra-conserved regulatory elements across placental mammals.
"Using evolutionary constraint from our 240 mammals in the Zoonomia project, we can actually pinpoint exactly which positions have a [conserved] function and which don't," Zoonomia researcher Kerstin Lindblad-Toh, a researcher affiliated with the Broad Institute and Uppsala University, explained in the press briefing. "What are these constrained elements? They're probably regulatory elements, in addition to protein-coding genes."
An international team led by investigators at the Max Planck Institute of Molecular Cell Biology and Genetics and other centers in Germany described a machine learning gene annotation method called TOGA ("tool to infer orthologs from genome alignments") in another study, where the group compared some 488 placental mammal genomes to sequence assemblies for more than 500 birds.
On the other hand, researchers at Carnegie Mellon University and other centers in the US and Sweden outlined a machine learning method called "Tissue-Aware Conservation Inference Toolkit" (TACIT) for linking complex mammalian phenotypes to suspected enhancer sequences — an approach they used to narrow in on neurological phenotypes stemming from specific enhancers.
A team from Texas Tech University, the Broad Institute, and elsewhere used the Zoonomia dataset to retrace transposable element representation, variation, and evolution. The results suggested that mobile elements make up as little as 28 percent of the genome in the Brazilian guinea pig, for example, but comprised nearly 66 percent of the genome in the hazel dormouse, with other mammals falling in between.
Meanwhile, Karlsson and the University of California Santa Cruz's Beth Shapiro led a team that used Zoonomia genomes in combination with 682 dog or wolf genomes for a comparative study analyzing ancestry and adaptation patterns in the sled dog Balto, famous for leading the last leg of a run to bring diphtheria antitoxin to Nome, Alaska, in the early 1920s.
The sequence data suggested that Balto had Siberian husky, Alaskan sled dog, Greenland dog, Tibetan mastiff, and village dog-related ancestry, but was smaller and likely better able to digest starch than sled dogs today, and had a distinct tan-fringed black coat. More broadly, the authors of that study saw signs that Balto came from a sled dog population that was more genetically diverse and adapted to Alaskan conditions than modern-day dog breeds.
Still other studies delved into the mammalian phylogeny of historical transitions between species based on incomplete lineage sorting data; evaluated species' extinction risk based on historical population sizes and present-day conservation status; and looked at cross-mammal phyloP scores as a means of finding functional variants with potential roles in human traits or conditions.
The results suggested that placental mammals started diversifying around the time the continents broke apart, for example, followed by a burst of mammalian diversification after dinosaur extinction.
"It puts a new, kind of extended, timeline on that mammalian evolutionary history," Karlsson said, adding that the accelerated mutation analysis pointed to fast-evolving mammalian genes and pathways, including genes involved in hibernation and aging as well as adaptations in relatively unstudied species.
In a related perspectives article, University of Melbourne researcher Irene Gallego Romero focused on the latter set of Zoonomia studies, which looked at the mammalian genomes with an eye to understanding human health, disease, trait variation, and evolution.
"[I]n the same way that the inclusion of genetically diverse individuals has enhanced the power of human genome-wide association studies to identify genomic regions that are causally associated with both noninfectious diseases and healthy human variation … the Zoonomia studies demonstrate how explicitly thinking of humans as a mammal among mammals can substantially enrich our understanding of the emergence of evolutionary novelty and human uniqueness," Gallego Romero argued.
For another perspectives paper accompanying the studies, Arizona State University's Nathan Upham and Michael Landis, a researcher at Washington University, touched on findings from several studies focused on placental mammals, while highlighting remaining sampling biases, genome quality variability, and the need to continue filling in genome sequence gaps in some animal orders.
"Future work should strive to evenly sample species relative to geographic realm, latitude, and elevation; island versus continental occurrence; body size, longevity, and other life history traits; conservation status; and phylogenetic distinctiveness," Upham and Landis suggested.
Even so, they noted that Zoonomia and related projects in the past "have opened myriad new portals for exploring genome architecture, population structure, and global diversification in mammals, with findings that promise to astound in coming decades."