A compendium of nearly 200,000 draft-quality genomes of DNA viruses found in the human gut microbiome is presented in Nature Microbiology this week. The Metagenomic Gut Virus catalog was created by a Lawrence Berkeley National Laboratory-led team that performed large-scale identification of viral genomes from 11,810 bulk metagenomes from human stool samples derived from 61 previously published studies. It contains 189,680 genomes representing an estimated 54,118 species-, 5,800 genus-, and 1,434 family-level viral operational taxonomic units, with more than 75 percent of the genomes representing double-stranded DNA phages that infect members of the Bacteroidia and Clostridia classes. Sequence clustering revealed 54,118 candidate viral species, 92 percent of which were not found in existing databases. "These genomes vastly expand the known diversity of DNA viruses from the gut microbiome and improve knowledge of host-virus connections," the researchers write. "We expect the … catalog will be a useful community resource for interrogating the role of the gut virome in human health and disease."
A new format and software package for the efficient storage and analysis of quantitative genomic data is reported in Nature Computational Science this week. Despite the ubiquity of quantitative genomics datasets, the two most widely used formats for storing quantitative datasets — called bigwig and bedGraph — have limitations that inhibit analysis speed, require excessive storage, or both. To overcome these issues, scientists from the University of Utah developed the dense depth data dump (D4) format and tool suite, which aims to balance improved analysis speed and file size. They show that the D4 format encoding enables fast analysis of the underlying data while providing file sizes that are better than or comparable to compressed bedGraph, bigWig6, and HDF58 formats, they write. "The D4 format and associated tools support fast random access, aggregation, summarization, and extensibility to future applications. These capabilities facilitate a larger scale of genomic analyses that would be otherwise slower."
A novel method for transgene introduction and expression in plants, which could have applications in biotechnology and synthetic biology, is published in this week's Nature Plants. The use of plants to produce therapeutic proteins and high-value compounds — known as plant molecular farming — holds great potential. Chloroplasts are particularly attractive subcellular compartments for the expression of foreign genes, but technical hurdles around transgene insertion in the chloroplast genome are limiting. Aiming to develop an alternative method for gene expression in chloroplasts, scientists from Algentech developed an approach in which a transgene is amplified as a physically independent entity called a minichromosome. "Amplification occurs in the presence of a helper protein that initiates the replication process via recognition of specific sequences flanking the transgene, resulting in accumulation of extremely high levels of transgene DNA," they write. The investigators show in tobacco plants that amplified transgenes serve as a template for foreign protein expression, are maintained stably during plant development, and are maternally transmitted to the progeny.