NEW YORK (GenomeWeb) – An Illumina-led team has demonstrated the feasibility of using non-human primate genome sequence data to help classify human variants, narrowing in on those with clinical actionability.
The researchers relied on a deep neural network approach trained with common variants gleaned from population sequence data from six non-human primate species: chimps, bonobos, gorillas, orangutans, rhesus macaques, and marmosets. Using this approach, it was possible to identify authentic disease-risk variants with around 88 percent accuracy, they reported. The team's results, published online today in Nature Genetics, highlighted 14 genes with genome-wide significant ties to intellectual disability. Based on these findings, the investigators suggested that more comprehensive profiling on non-human primate common variants could continue improving the classification of variants that are currently considered variants of uncertain significance.
"Our results suggest that systematic primate population sequencing is an effective strategy to classify the millions of human variants of uncertain significance that currently limit clinical genome interpretation," senior author Kyle Kai-How Farh, a researcher at the Illumina Artificial Intelligence Laboratory, and his colleagues wrote, noting that the "accuracy of our deep learning networks on both withheld common primate variants and clinical variants increases with the number of benign variants used to train the network."
Just as common variants found in human genome and exome sequence collections like the Genome Aggregation Database or Exome Aggregation Consortium have been used to sift out variants that are not damaging enough to be removed by natural selection, the team reasoned that it should be possible to broaden the set of variants used to rule out pathogenicity by taking advantage of genetic relationships between humans and non-human primates.
"If polymorphisms that are identical-by-state similarly affect fitness in the two species, the presence of a variant at high allele frequencies in chimpanzee populations should indicate benign consequences in human, expanding the catalog of known variants whose benign consequence has been established by purifying selection," the authors explained.
After identifying common variants in two dozen unrelated chimp genomes, the authors compared allele frequencies for corresponding variants in the human genome. They noted that the ratio of missense to synonymous variants in the genome "is consistent with the absence of negative selection against common chimpanzee variants in the human population and concordant selection coefficients on missense variants in the two species."
The team expanded its analysis to include human variants that were identical-by-state in the dbSNP databased, the chimp genome, or in the genomes of the five other non-human primates included in the study. Again, the results suggested that benign variants were over-represented in the shared, identical-by-state primate variants.
"After excluding variants of uncertain significance and those with conflicting annotations, ClinVar variants that are present in at least one non-human primate species are annotated as benign or likely benign 90 percent of the time on average, compared to 35 percent for ClinVar missense variants in general," the authors wrote.
From there, the researchers used a collection of amino acid sequences predicted from common variants in the human and non-human primate genomes to train a deep residual network variant classifier known as PrimateAI, developing high pathogenicity scores for new and known variants.
Along with variants in the known epilepsy- and intellectual disability-related gene SCN2A, for example, the team tracked down suspicious variants in 14 candidate genes using a set of de novo missense variants from nearly 4,300 individuals with neurodevelopmental disorders.
Indeed, a small number of primate genomes "contribute a disproportionate amount of information about common benign variation," the authors noted. Even so, they cautioned that poaching and habitat loss are pushing many of the known non-human primate species closer to extinction, which could lead to "an irreplaceable loss in genetic diversity."
The PrimateAI artificial intelligence software is open source and is being released on Illumina's BaseSpace Sequence Hub and through GitHub, Illumina announced today.