Skip to main content

California Team Applies Alignment-Free Approach to Mammalian Phylogeny

NEW YORK (GenomeWeb News) – The entire mammalian genome, not just coding sequences, contains informative phylogenetic information, according to a study appearing online last night in the Proceedings of the National Academy of Sciences.

Researchers from the University of California at Berkeley and the Lawrence Berkeley National Lab used a non-alignment approach called the feature frequency profile (FFP) to compare the non-genic, intronic, exonic, and whole-genome regions of 10 mammalian genomes. Their analysis suggests clues about past evolutionary events are scattered throughout mammalian genomes, with both coding and non-coding regions containing phylogenetic clues.

"All of the genome contains evolutionary information that you can pull out from it," lead author Gregory Sims, a computational biologist at UC Berkeley, told GenomeWeb Daily News.

In the past, genomic comparisons and phylogenomics often focused heavily on the genomes' protein coding content or, sometimes, specific types of genes. But looking only at certain genes can give conflicting information about species relationships, Sims explained.

And with the advent of projects such as ENCODE, which aim to decipher the function of the rest of the genome, the team noted, there has been a new appreciation for the non-coding parts of the genome — which Sims and his co-workers believe houses information relevant for understanding evolution and species relationships.

"[T]here is strong evidence that a traceable evolutionary history lies embedded in some selected highly conserved non-genic regions as well as genic regions," the researchers wrote.

Along with UC Berkeley researchers Se-Ran Jun, Guohong Albert Wu, and Sung-Hou Kim, Sims came up with the FFP method, described in a PNAS paper earlier this year. The approach relies on an alignment-free method for comparing genomes — even full genomes from distantly related species that don't share a set of gene sequences or aligned sequences.

For the current study, the team compared whole genomes as well as the non-genic, intronic, and exonic portions for 10 mammals — human, chimpanzee, rhesus monkey, mouse, rat, dog, horse, cow, opossum, and platypus — which have each been sequenced to at least 10 times coverage.

They found consistent patterns and evolutionary relationships in each of the genome regions. These relationships also matched those expected from past studies, the team noted, revealing "bush-like" phylogenetic trees that reflect rapid mammalian species radiation.

"[T]he phylogenies obtained with the FFP method, whether we use the whole, intronic, exonic, or non-genic genomes, are all topologically equivalent to the current consensus view of evolutionary relationships between mammalian clades," they wrote. "Irrespective of the type of genomic region, evolutionary footprints are present in all parts of the genome."

This suggests even the non-coding parts of the genome are under evolutionary constraint, the researchers noted, though the precise nature of this may differ from that present for the protein coding parts of the genome.

In the future, Sims said the researchers intend to apply a similar approach to whole human genomes, looking in particular at the patterns in the least conserved and least understood non-genic regions of the human genome.

The Scan

Pfizer-BioNTech Seek Full Vaccine Approval

According to the New York Times, Pfizer and BioNTech are seeking full US Food and Drug Administration approval for their SARS-CoV-2 vaccine.

Viral Integration Study Critiqued

Science writes that a paper reporting that SARS-CoV-2 can occasionally integrate into the host genome is drawing criticism.

Giraffe Species Debate

The Scientist reports that a new analysis aiming to end the discussion of how many giraffe species there are has only continued it.

Science Papers Examine Factors Shaping SARS-CoV-2 Spread, Give Insight Into Bacterial Evolution

In Science this week: genomic analysis points to role of human behavior in SARS-CoV-2 spread, and more.