NEW YORK – A team of researchers from around the US has developed an updated reference genome assembly of the rhesus macaque (Macaca mulatta) — the most widely used nonhuman primate (NHP) model in biomedical research — finding new lineage-specific genes and expanded gene families that could be informative in studies of evolution and human disease.
In a study published on Thursday in Science, the researchers said their updated genome assembly for the macaque increased the sequence contiguity 120-fold. They also annotated it using 6.5 million full-length transcripts to improve their understanding of its gene content, isoform diversity, and repeat organization.
The researchers also performed whole-genome sequencing on 853 rhesus macaques, and identified nearly 86 million single-nucleotide variants and 10.5 million indel variants, including potentially damaging variants in genes associated with human autism and developmental delay. This additional data could provide a framework for developing noninvasive NHP models of human disease.
"The rhesus macaque is a key species for both primate comparative studies and biomedical research. This work is an important step toward a complete genome which will improve our understanding of evolution and the species' utility for research," co-corresponding author and University of Washington genome sciences professor Evan Eichler said in an email. "In particular, we can now finally tackle some of the more complex regions of the genome and begin to understand how new genes evolve including the processes that have shaped them."
Indeed, the researchers wrote in their paper, a detailed understanding of the evolution of NHP genomes can lead to a greater understanding of human traits and putative disease genes. Evolutionary analyses of several NHP genomes and comparisons of different species have previously revealed lineage-specific changes in retro-elements, duplicated genes, and functionally relevant mutations in human disease-associated genes.
The importance of the macaque for evolutionary research, according to co-corresponding author Jeffrey Rogers, is its exact place in the evolutionary tree compared to humans and great apes like chimpanzees and gorillas. Geneticists frequently ponder what makes humans unique, said Rogers, an associate professor of molecular and human genetics at Baylor College of Medicine. And while comparing the genomes of humans and chimps or gorillas turns up differences, it can be hard to tell whether those differences occurred as a result of changes in the human genome or changes in the ape's genome.
"In order to decide whether the novelties are novel on the human side or the chimp side, you need an outgroup. And the rhesus monkeys are a very good outgroup because they're in the right place in the evolutionary tree," he said. "They're closely enough related to humans and chimps and gorillas, [and] we also know so much about their fundamental biology because they are used as a biomedical research model."
Eichler also noted that as a more complete reference sequence, the new macaque genome assembly can serve as a resource for researchers to not only investigate mutational rates and processes, but also to build more accurate genetic models of human disease. "This is especially relevant for genetic models that fail or are only partially recapitulated in more distantly related organisms such as mouse and rat," he said.
Further, added co-corresponding author and University of Missouri genomics professor Wesley Warren, the new data enables other researchers "to compare gene-specific variants linked to human disease but in some cases without apparent health consequences in macaque," and will be of use in studying several genomic processes in primates, such as diversity and functional constraint.
"We are only in the beginning stages of exploring this rich data source, with but a few examples of neurodevelopmental genes harboring sequence variation that are intolerant to mutation in humans," he said.
The researchers began by generating long-read sequence data from and assembling the genome of a female rhesus macaque from India. This reference consists of 20 autosomes and the X chromosome. For completeness, they added a previous bacterial artificial chromosome-based representation of the Y chromosome. After conducting various analyses, they found that more than 99.7 percent of the gaps present in a previous Indian-origin rhesus macaque genome assembly had been closed through the generation of the new reference genome. Further, the number of misoriented genes was reduced from 4.83 percent in the previous genome to only 0.13 percent in the new one.
The researchers then went on to generate WGS data for 850 rhesus macaques from captive US research colonies and three wild-caught Chinese macaques. Most of the animals (810) were designated as being of Indian origin, and the remaining individuals were of Chinese or suspected admixed origin. SNVs and indels were identified based on mapping reads to the new reference genome, and the researchers identified 85.7 million SNVs, including 21.3 million singletons, as well as 10.5 million indels.
Significantly, they noted, a recent study of 929 human genomes from 54 diverse global populations identified 67.3 million SNVs. Therefore, the research rhesus macaques were more than twice as diverse per individual as humans, with the average macaque carrying 9.7 million SNVs.
To illustrate the biological potential of macaque genetic diversity, the researchers then went on to identify naturally occurring macaque mutations in orthologs of human genes implicated in autism and developmental delay. In humans, de novo deleterious mutations in these genes are thought to be dominant and have a large effect, but mouse models often do not recapitulate the complexity of neurobehavioral features of human disease, they said. Unexpectedly, they identified nine genes with candidate deleterious mutations in macaques that were intolerant to mutation in humans and in which de novo mutations were associated with neurodevelopmental disorders.
Indeed, Rogers said, rhesus macaques are important for studies of conditions ranging from infectious disease (including COVID-19) to neuroscience, cancer, and reproductive biology, so a high-quality reference genome can aid researchers who are looking to understand the causes of various illnesses or aiming to develop treatments.
Importantly, the macaque genetic variation data can help researchers to discover and use spontaneous, or naturally occurring, models of human genetic diseases. Much of human genetic disease is caused by mutations that segregate in the population at low frequency, Rogers said, especially recessive mutations that are heterozygous in parents but can be deleterious if two copies are passed on to children.
Rhesus macaques also segregate for a significant amount of genetic variation, which is damaging to their genes, but most of that variation is recessive, Rogers explained. "What we're finding when we randomly sample rhesus monkeys is that there are lots of mutations segregating in the research populations — they are damaging mutations in genes that are known to cause human genetic diseases, but they're low-enough frequency that we don't see the disease very often in the macaques," he said. "By identifying individuals that carry the mutations, you can follow their offspring, or in some cases you can actually create new breeding groups that would generate homozygous individuals that would have the phenotype, and then create a naturally occurring model of human genetic disease."
These spontaneous models of disease don't require any engineering techniques to create, and they faithfully recapitulate human genetic diseases. As an example, Rogers noted that he and some other colleagues are using this technique to study a retinal degeneration phenotype that causes congenital blindness, and have identified a spontaneous macaque model of a cone photoreceptor-specific mutation that causes progressive blindness in people.
"We're now studying and actually testing gene therapy and stem cell therapies in macaques without having to create the disease model ourselves," he said. "We're just taking advantage of a naturally occurring model."