University of Texas and MD Anderson Cancer Center researchers describe a deep learning method called DeepFun for exploring the functional consequences of non-coding variants from past genome-wide association studies in specific cell or tissue types. The tool was informed by thousands of DNA accessibility, histone mark, and transcription factor binding profiles produced using DNase I sequencing, chromatin immunoprecipitation sequencing, and other experiments on more than 200 tissue or cell types, the team says, noting that the epigenomic insights provide a refined look at functional roles of non-coding variants, including those with tissue-specific or cell type-specific effects. "By using the datasets from various GWAS studies," the authors say, "we conducted independent validations and demonstrated the functions of the DeepFun web server in predicting the effect of a non-coding variant in a specific tissue or cell type, as well as visualizing the potential motifs in the region around variants."
A Wageningen University-led team outlines a web server and algorithm designed for predicting primary metabolic gene clusters (MGCs) in gut microbial communities. The online gutSMASH tool is designed to be a user-friendly and accessible strategy for detecting new or known MGC pathways, the researchers say, using genome sequences from anaerobic bacteria that thrive in the low-oxygen environment in the human gut. When they applied gutSMASH to genetic data from the human gut pathogen Escherichia albertii, E. coli, and a Crohn's disease-associated species called Ruminococcus gnavus, for example, the authors uncovered a range of documented and suspected gene clusters influencing primary metabolite production. From these and other results, they suggest gutSMASH "is able to predict not only known MGCs but also putative gene clusters that may aid the discovery of novel molecules of importance for human (or animal) health."
Wellcome Genome Campus researchers outline an online resource for scrutinizing the SARS-CoV-2 coronavirus and the COVID-19 pandemic. The open-access COVID-19 Data Portal (CDP) is routinely updated, the team notes, and contains a range of RNA genome sequences, related host sequences, gene expression data, protein patterns, biochemical clues, and other data types, along with insights from related COVID-19 literature. "The CDP has continued to evolve rapidly to meet the growing needs of the highly active global research community," the authors write, concluding that "[i]n additional to more data resources and data types being added as COVID-19 annotations become available, plans to implement a cohort browser, genome assembly visualization, and a host of other analyses and visualizations are in the current roadmap."