NEW YORK (GenomeWeb News) – Members of the Human Microbiome Jumpstart Reference Strains Consortium reported online in Science today that they have analyzed data for nearly 200 microbial reference genomes, identifying thousands of previously unidentified sequences and gaining clues about the diversity of microbes found on and in humans.
The team, which includes researchers from the Baylor College of Medicine, Broad Institute, J. Craig Venter Institute, and Washington University, analyzed genome sequence data for 178 microbial strains as part of their effort to characterize microbial reference strains that will ultimately aid in metagenomic studies of human microbiomes.
"This is a major study that moves us in the right direction to understanding the complex microbiota associated with the human body, and outlines how we benefit from this relationship," co-senior author Karen Nelson, director of the Venter Institute's Rockville, Md., campus, said in a statement.
In the process, the researchers found more than half a million predicted polypeptides, including nearly 31,000 that hadn't been found in the past. The new sequence data also increased the researchers' ability to interpret existing metagenomic data, though they noted that metagenomes still contain a great deal of sequence information missing from existing reference repertoires.
"We did not expect this one initial study to reveal this much uniqueness," Nelson told GenomeWeb Daily News. "I think it's the foundation for a lot more interesting work."
The Human Microbiome Project, a National Institutes of Health Roadmap Project, was launched in late 2007, with the goal of identifying microbes found on and in the human body and determining the role of these microbes in human health and disease.
For its part, the HMP Jumpstart Reference Strains Consortium has focused on characterizing hundreds of reference strains that can eventually be used to help interpret metagenomic sequence data.
"This initial work lays the foundation for this ambitious project and is critical for understanding the role that the microbiome plays in human health and disease," National Institutes of Health Director Francis Collins said in a statement.
To kick off this catalog, the team selected hundreds of bacterial and archaeal strains for sequencing and analysis — including strains found in the human gastrointestinal tract, oral cavity, urogenital and/or vaginal tracts, skin, and respiratory tract.
The reference strains selected are being chosen based on information provided by working groups representing body sites to be sampled using metagenomic approaches as well as recommendations from the broader research community, Nelson said.
Of the more than 350 genomes sequenced, mainly using Roche 454 technology, 178 are analyzed in the new paper. The microbes sequenced in the study can all be cultured in the lab, Nelson said, though she noted that the researchers have since expanded their focus to include some uncultured microbes.
For the nine HMP microbial species for which more than one sequenced, annotated genome is available, the researchers noted, four species are represented by five or more genome sequences, making them amenable to pan genomic analyses.
For instance, for the gut microbes Lactobacillus reuteri, Bifidobacterium longum, and Enterococcus faecalis, the researchers argue, "more genome sequencing needs to be undertaken to characterize the actual makeup of the species as a whole."
Meanwhile, the team explained, Staphylococcus aureus, which has been identified on the skin, urogenital tract, and mucous membranes of humans and other mammals appear to have what's called a closed pan-genome model, Nelson explained, meaning the genomic data available for that species is sufficient to represent most of the genetic diversity found in this species.
The core S. aureus genome consists of some 2,295 genes, the researchers reported, while the pan genome contains roughly 3,200 genes.
When the team compared the 547,968 predicted polypeptides identified in the newly sequenced reference strains with sequences in the NCBI protein database, they found 30,867 polypeptide sequences — including 29,987 unique sequences — not present in the database.
And by comparing sequences in the new and previously sequenced reference genomes with those found in metagenomic data from two studies of the human gut, the researchers found the new reference data improved their ability interpret metagenomic data.
"The results show that we are choosing the right organisms to sequence and that they are representative of members of the human microbiome," co-senior author Sarah Highlander, a molecular biology and virology researcher at the Baylor College of Medicine, said in a statement.
Even so, the team explained, roughly a third of metagenomic sequences still aren't easily decipherable using information from all available reference genomes, suggesting additional reference data is needed.
Those involved with the Human Microbiome Project reportedly plan to sequence at least 900 microbial genomes, while other members of the International Human Microbiome Consortium, such as the European MetaHIT team, are sequencing additional microbial strains, bringing the number of reference strains to be sequenced to more than 1,000.
Although the sequencing technology to be used for the remaining strains will likely vary based on the instruments available at each of the participating centers, Nelson noted, the researchers will likely use a combination of Roche 454 and Illumina or Illumina sequencing alone for future stages of the reference project.