A computing infrastructure for accessing genomic sequences captured by the global research community is presented in Nature this week. Since the completion of the human genome, the growth of DNA sequencing databases has exploded, currently exceeding 20 petabases. To enable the use of these data, an international group of researchers developed Serratus, a free, open-source cloud-computing infrastructure optimized for petabase-scale sequence alignment against a set of query sequences. To demonstrate the resource, the scientists used it to search 5.7 million biologically diverse samples for the hallmark gene RNA-dependent RNA polymerase and identified more than 105 novel RNA viruses, expanding the number of known species by about an order of magnitude. They also characterized novel viruses related to coronaviruses, hepatitis delta virus, and huge phages, as well as analyzed their environmental reservoirs. "This work and further extensions of petabase-scale genomics are shaping a new era in computational biology, enabling expansive gene discovery, pathogen surveillance, and pangenomic evolutionary analyses," the team writes.
The discovery of a transcription factor involved in bacterial genome organization is detailed in Nature Genetics this week. Nucleoprotein complexes play a key role in the genome organization of eukaryotes and prokaryotes, some of which are known to influence global organization by mediating long-range anchored chromosomal loop formation leading to spatial segregation of large sections of DNA. While these interactions are ubiquitous in eukaryotes, they have not been shown in prokaryotes. Using a chromatin sedimentation assay and Hi-C, scientists from the University of Amsterdam and Heidelberg University find a transcription factor called Rok that forms large nucleoprotein complexes in the bacterium Bacillus subtilis. These complexes interact with each other over large distances and, importantly, lead to anchored chromosomal loop formation, thereby spatially isolating large sections of DNA, as previously observed for insulator proteins in eukaryotes.