German researchers describe a de novo repeat assembly method that compiles information on repeats in whole-genome sequencing data even in the absence of a reference genome. The approach, dubbed RepARK, relies on information on abundant k-mers in next-generation sequencing read data to define repeats. When the team compared RepArk to other methods for documenting repeats in Drosophila melanogaster sequence data, for example, it found that the approach could quickly produce repeat libraries that were comprehensive and well annotated.
The Wellcome Trust Sanger Institute's Richard Durbin and colleagues outline a scheme known as TelSeq for estimating the lengths of telomeres using whole-genome or whole-exome sequence data. The researchers demonstrated the approach using leukocyte samples representing 260 individuals from the TwinsUK cohort who ranged in age from 27 years old to 74 years old. With that data they got telomere length estimates that corresponded well with those measured in Southern blot experiments. The team also applied TelSeq to 96 samples from the 1000 Genomes Project.
A Georgia Institute of Technology team introduces a bioinformatics-based method for assigning metagenomic or genomic sequences to the appropriate taxonomic group. The researchers say their homology-based method, known as MyTaxa, takes into account all of the gene sequences present in a given sample, assigning different weights to each gene based on its ability to act as a taxonomic classifier. It also considers amino acid identity across the genome when classifying unknown organisms. In proof-of-principle analyses using real and simulated metagenomic data, the study's authors found that MyTaxa compared favorably to existing approaches for unraveling the taxa present in the metagenomes.