By Monica Heger
This story was originally published March 8.
An international team of scientists has used a metagenomics sequencing approach on the Illumina Genome Analyzer to characterize the gut microbiome in 124 individuals — generating over 200 times more data than any previous human microbiome study.
The sequencing was done at BGI-Shenzen and marked the first step in the four-year Metagenomics of the Human Intestinal Tract, or MetaHIT, project, which aims to create a comprehensive gene catalog of the gut microbiome in order to understand how the bacteria that live in human intestines are related to disease (see In Sequence 6/24/2008).
The researchers sequenced DNA from fecal samples from 124 Europeans from both a Danish and Spanish cohort of healthy, obese and inflammatory bowel disease patients, generating 576.7 gigabases of sequence data. They predicted 3.3 million potential genes, which dwarfs the 319,812 previously identified and sequenced intestinal bacterial genes.
The study, published last week in Nature, is "going to raise the bar on metagenomic experiments," said George Weinstock, associate director of the Genome Center at Washington University, who was not involved in the study. "It's moving us into the era of the type of metagenomic studies that should be done."
Several human microbiome studies have been published previously, including the two largest gut microbiome studies as of last week: a Washington University study comparing the gut microbiomes of lean and obese twins and a metagenomic analysis of fecal samples from 13 individuals of varying ages by a group from the Nara Institute of Science and Technology in Japan. But these, and other, efforts were superficial, Weinstock said, because they didn't sequence enough individuals or to enough depth, so their results were often inconsistent or wrong. "Now, you're really getting a significant characterization of a large number of individuals and starting to see things that you could only get glimpses of or barely see before."
The MetaHIT team used the Illumina GA for all the sequencing. For each of 15 of the samples, they created one paired-end library with a DNA insert size of 200 base pairs and generated read lengths of 44 base pairs. For each of the next 109 samples, they constructed two paired-end libraries with insert sizes of 125 base pairs and 400 base pairs, respectively, with read lengths of 75 base pairs. For each of the 122 samples, they generated an average of 62.5 million reads, ranging from a low of 35.4 million reads for the samples with single libraries to a high of 97.6 million reads for the samples with two libraries. For two samples, they generated over 12 gigabases of sequence data, but they determined that 4 gigabases was sufficient to capture all novelty, so the remainder of the samples were sequenced to an average of 4.5 gigabases.
The reads were then assembled into contigs with BGI's SOAPdenovo assembler. First, each sample was assembled independently — about 43 percent of the reads could be assembled into 6.58 million contigs with an N50 of 2.2 kilobases. The reads that could not be assembled were then pooled, and an additional 400,000 contigs were formed with an N50 of 939 base pairs.
To assess the quality of the assembly, the team mapped the contigs from two of the samples to Sanger reads that they sequenced from the same samples. They found that "more than 90 percent of the Sanger reads were covered by the Illumina sequences to a high and uniform level." They also sequenced one sample with 454 and found similar accuracy when mapped against the Sanger reads.
The study shows "the power of Illumina sequencing and new de novo assembly methods for capturing the complex human fecal microbiome," said Sarah Highlander, an associate professor of microbial genomes at Baylor College Medical School, who is working on the National Institutes of Health-funded Human Microbiome Project to sequence microbial reference genomes.
Dusko Ehrlich, one of the senior authors of the study who heads microbial genetics at the Institut National de la Recherche Agronomique in Joy en Josas, France, told In Sequence that the team plans to continue to use the Illumina platform, and is also considering 454 for certain portions of the project where they want to extend contig lengths. The consortium will not use Sanger sequencing going forward, Ehrlich said.
Weinstock agreed that for tasks like trying to sequence unculturable microbes, the longer read lengths of 454 would be useful because it would be trickier to assemble the genome of an unculturable microbe with lots of short reads.
The MetaHIT team compared its results to previous human gut microbiome studies — the fecal analysis of 13 individuals published by Japanese researchers and the study comparing the gut microbiomes of obese and lean twins published by Washington University. Seventy percent of the reads from the Japanese sample and 85.9 percent of the reads from the US sample could be aligned to the MetaHIT contigs, indicating that the consortium captured much of the same sequence as previous studies.
However, the group also captured more novelty than the other studies — 85.7 percent and 69.5 percent of the MetaHIT contigs were not covered by reads from the Japanese and US samples, respectively.
The team also found that the sampled individuals shared around 38 percent of their intestinal microbial genes. "Contrary to what was thought before, individuals are rather similar to each other in respect to their gut microbiome," said Ehrlich in an e-mail. "Nevertheless, they are certainly not identical, and comparing the similarities and differences, in relation to human phenotypes, should give us a better understanding of the overall human biology as well as of the health and disease states."
Now, the team is in the process of evaluating the 3.3 million gene candidates and trying to characterize which of those are true genes. They are also continuing to sequence and assemble gut microbiome genes from 350 healthy and obese individuals, as well as inflammatory bowel disease patients, and will profile additional obesity and inflammatory bowel disease cohorts.
Ehrlich said that the consortium has already started to see results from that project, notably that Crohn's disease patients have a reduced diversity in their gut microbiome. And, in the current study, the authors reported that inflammatory bowel disease patients contained an average of 25 percent fewer microbiome genes than healthy individuals.
"It's such a sizable data set that we'll have much more confidence in the conclusions," such as finding disease-related genes and characterizing the similarities between healthy gut microbiomes among individuals, said Weinstock.