Researchers based in the US and China present a k-mer-based computational approach for identifying microbial species and strains from metagenomic sequence data. The strategy, dubbed GSMer, uses markers gleaned from already-sequenced microbial genomes to perform this identification, the study's authors say. For their proof-of-principle analysis, they used data from 5,390 sequenced microbes to define more than 11.7 million species-specific k-mers and nearly 9 million k-mers that were specific to a given strain. Those identifiers, in turn, proved useful for classifying strains and species in both mock and authentic metagenome sequence sets, including gut microbes associated with type 2 diabetes or with body weight.
Using search methods that take sequence substitutions into account, a duo from Japan and France developed a scheme for finding similar sequences within sets of mammal genomes or insect genomes. With this method, which relies on a "complementary transition seed" search approach, the researchers demonstrate that they could detect around 20,000 human-mouse sequence alignments not described in the past. "We hope these results will help to elucidate the evolutionary story of DNA sequences," they conclude, "and also spur other researchers to further improve DNA similarity search, which is still not fully solved."
A Spanish and German team took a phylogenomics-based look at markers for reliably defining phylogenetic relationships. "In contrast to previous approaches, our methodology does not only rely on the ability of individual genes to reconstruct a known phylogeny," the study's authors explain, "but it also explores the combined power of sets of concatenated genes to accurately infer phylogenetic relationships of species not previously analyzed." In proof-of-principle experiments, for example, they narrowed in on half a dozen genes for investigating relationships between cyanobacterial species as well as a minimal set of four genes for looking at ascomycetous fungus phylogeny.