NEW YORK – Researchers at the University of Maryland School of Medicine have assembled a comprehensive, functionally and taxonomically annotated gene catalog of the human vagina that includes 950,000 non-redundant genes.
In a study published on Wednesday in Nature Communications, the researchers said they constructed the gene catalog, which they named VIRGO, using a combination of metagenomes and urogenital bacterial isolate genomes. The genes identified in these data were further clustered into vaginal orthologous groups (VOGs), providing a catalog of functional protein families common to vaginal microbiomes. To highlight VIRGO's utility, they then analyzed 1,507 additional vaginal metagenomes, and identified a high degree of intraspecies diversity within and across vaginal microbiota.
"Though 16S rRNA gene sequencing has provided foundational insight into the role of the vaginal microbiota in a wide array of common diseases, including bacterial vaginosis, urinary tract infection, and chlamydia, there was a definite unmet need to utilize advanced sequencing to build a reference gene catalog for the vagina," Maryland Professor of Microbiology and Immunology Jacques Ravel, the paper's senior author, said in a statement. "VIRGO provides researchers with a new tool to understand the role of vaginal microbes in human health, and ultimately to design new solutions to diagnose, prevent, and treat conditions which impact millions of women globally."
The researchers curated VIRGO with taxonomic assignments as well as functional features using 17 diverse protein databases. Importantly, they showed that it provides more than 95 percent coverage of the human vaginal microbiome, and that it applies to populations from North America, Africa, and Asia.
VIRGO was constructed using sequence data from 264 fully de-identified vaginal metagenomes as well as 308 complete and draft genomes of urogenital bacterial isolates. Of the approximately 18 billion reads generated for these metagenomes, nearly 80 percent were identified as human sequences and removed. The researchers found that vaginal metagenomes dominated by Lactobacillus spp. had significantly higher proportions of human sequence reads than those from Lactobacillus-deficient metagenomes. Each metagenome was then de novo assembled totaling 1.2 million contigs with a combined length of 2.8 billion bp.
Taxonomic analysis of the metagenomes revealed that these communities contained 312 bacterial species. All major vaginal Lactobacillus species as well as common facultative and strict anaerobic vaginal species such as Gardnerella vaginalis, Atopobium vaginae, Prevotella amnii, Megasphaera genomosp., Mobiluncus mulieris, Mageebacillus indolicus, and Veillonella parvula, among others were also identified. The researchers also observed that even bacteria associated with bacterial vaginosis that are often only present at low abundance were represented in the taxonomic analysis. These results highlighted the taxonomic breadth of the vaginal bacterial communities included in the construction of VIRGO.
In a subsequent analysis, the researchers translated the non-redundant genes into amino acid sequences and clustered them into VOGs, in order to create a database that can be used to investigate the protein families found in the vaginal microbiome. They measured the similarities between amino acid sequences, and found that 38.5 percent of all VOG proteins were unique.
To demonstrate the utility of VOGs, the investigators retrieved 32 proteins of the orthologous family encoding vaginolysin, a G. vaginalis cholesterol-dependent cytolysin that is key to its pathogenicity as it forms pores in epithelial cells. They identified three amino acid variants in an 11-amino acid sequence of domain 4 of vaginolysin. One of the three variants had not been reported previously.
"This example illustrates how VOG can be mined to understand biological relevance and to generate hypotheses," the authors wrote. "In this case it points to potential differences in pore formation activity and possibly cytotoxicity, which could be further investigated."
The researchers also used VIRGO to characterize the genome content of individual bacterial species present in the vaginal microbiome. They applied VIRGO to a dataset of 1,507 in-house and publicly available vaginal metagenomes to characterize the gene content of four Lactobacillus species and three additional species commonly found in the vagina (G. vaginalis, A. vaginae, and P. timonensis). They recovered most of each species' gene content, even when that species was present at low abundance in a community. This demonstrated that VIRGO has the capability to characterize the gene content of low-abundance taxa from metagenomic data, the researchers wrote.
Using these species-specific gene repertoires, they then characterized the amount of intraspecies diversity present within an individual woman's vaginal microbiome. Because VIRGO comprises the "pangenomes" of each vaginal bacterial species, it can be used to evaluate the amount of intraspecies diversity present in these communities, the researchers said. For this analysis, they counted the number of genes that were assigned to each of the seven species in each of the 1,507 metagenomic datasets and compared this number to that found in each species' reference genomes. The results suggested that a woman's vaginal bacterial populations are routinely comprised of more than one strain of most species.
"Previous studies of the vaginal microbiome have largely treated these species as singular genotypes, although some more recent studies have examined intraspecies diversity in these communities," the authors wrote. "Intraspecies diversity is important because it is likely to influence many properties of the communities including their temporal stability and resilience, as well as how they relate to host health."
They further noted that VIRGO can be used to characterizing intraspecies diversity because it contains the non-redundant pangenomes of most bacterial species common to the vagina. By mapping sequence reads against the VIRGO database, it would be possible to identify unique genes that belong to each species in a metagenome.