Like most Germans, Simone Guenther first learned about the Escherichia coli outbreak that began there in mid-May through local news reports. On Thursday, May 26, researchers at University Hospital Muenster's Institute of Hygiene announced that they had isolated the Shiga toxin-producing E. coli strain responsible for the outbreak of infections. The next day, Guenther was on the phone with the Muenster team "asking if they would need any help, because I have the lab infrastructure in Darmstadt," she says.
As director of application development and business strategy at Life Technologies' German laboratories, Guenther was in a position to help the Muenster team understand the nature of this bacterial isolate — which has, by this article's press time, caused 27 fatalities and sickened more than 2,800 people across Europe. Upon receiving the culprit sample, Guenther and her colleagues prepared a library and began sequencing on Monday, May 30, using Life Tech's Ion PGM sequencing platform.
"We completed full sequencing on Wednesday afternoon [June 1], uploaded the data to Muenster and to [our] colleagues over the ocean at Life Technologies, and then we assembled the genome and submitted [it] around lunchtime on the East Coast Thursday [June 2] to NCBI," Guenther says. That same day, researchers at BGI in Shenzhen announced that, working in collaboration with investigators at Germany's University Medical Centre Hamburg-Eppendorf, they too had sequenced the strain responsible for the outbreak using the Ion PGM.
BGI's Bicheng Yang says the team chose to use that platform "because [of] its fast speed. Each run takes only a couple of hours, which, under this emergent situation, provided -basic information for analysis."
According to Life Tech's Guenther, "the new opportunity to have genomes sequenced in a record time frame clearly [provides] a potential [for] looking at, on a holistic level, what's going on in these microbes — what they have acquired, what they might have lost." Indeed, in the preliminary analysis, Muenster and her colleagues identified genomic evidence to suggest that the strain is a new, hybrid type of pathogenic E. coli. In its announcement, BGI said that the new strain also carries several antibiotic-resistance genes.
"The capability of looking at that data on a genome level ... will revolutionize how we look at clinical outbreaks," Guenther adds. "If you can sequence a microbe within three to four days — including the genome annotation — that really opens up a new field."
Not only are the genome sequences these groups have generated informing efforts to control the ongoing outbreak, they also point to the power of coupling genomic data with epidemiological investigations. At the American Society for Microbiology's annual meeting held in New Orleans in May, several researchers said that the molecular forensics-esque approaches already being used in the budding field of genomic epidemiology to reconstruct disease outbreaks could also enhance future surveillance and response efforts.
A new field
"The last 18 months or so have seen the birth of this discipline, erupting into the literature with a whole swath of papers," said the University of Birmingham's Mark Pallen at ASM. There are "at least a dozen, maybe between a dozen and two dozen papers now, on this subject."
In genomic epidemiology, "we have a new opportunity — high-throughput sequencing," Pallen said. "One might call this a disruptive technology because it changes the research landscape, the landscape of possibilities, in a way that makes us question all of our assumptions in the past ways we've done things."
[ pagebreak ]
Recently developed techniques for genomic epidemiology represent fundamentally new approaches to studying the origins and transmission dynamics of outbreaks. "Instead of cloning things into plasmids and replicating those plasmids in E. coli, you have solid-phase amplification ... where you have clonal templates growing, if you like, in molecular colonies," Pallen said, adding that there are "a whole range of new approaches to actually reading the sequence once you've actually got your template." And as more single- molecule sequencing approaches break onto the scene in the coming years, Pallen expects the possibilities for genomic epidemiology to swell.
Pallen pointed to the US Department of Justice's investigation of the anthrax letters mailed to politicans and media organizations in September and October 2001 as one of researchers' earliest forays into genomic epidemiology. In 2002, the Federal Bureau of Investigation contracted with researchers at The Institute for Genomic Research in Rockville, Md. — now part of the J. Craig Venter Institute — to compare the mailed anthrax strains against wild-type Bacillus anthracis. Pallen called the DOJ's summary report of the investigation, "a remarkable microbiological 'whodunit.'"
Before they had access to next-generation sequencing, many researchers used sequence-based molecular typing techniques to investigate the population structure and evolution of bacterial pathogens. Imperial College London's Brian Spratt and his colleagues developed one such approach — multi-locus sequence typing, or MLST — in 1998. During his lecture at ASM, Spratt said that molecular typing of bacterial pathogens is "a crucially important activity because we have to characterize strains for many aspects of public health — we have to analyze outbreaks ... we have to be able to identify and follow the spread of particularly significant strains of bacterial pathogens."
Spratt added that molecular typing techniques like MLST have helped researchers address the "broader questions of understanding the evolution of bacterial populations — -assigning and circumscribing species, looking for biogeography. A whole lot of activities in basic science and clinical microbiology depend on being able to characterize strains."
While MLST enables the identification of clinically important species, it's often unable to distinguish between closely related isolates of the same species. "There are many questions which can't be answered by MLST, as there are many species where MLST doesn't work," Spratt said. "MLST is still very useful, but of course … advances in genome sequencing [are] allowing us to have the ultimate resolution — the genome sequence of each isolate."
Indeed, in his own research, the Wellcome Trust Sanger Institute's Julian Parkhill found certain Staphylococcus aureus isolates to be indistinguishable using MLST and other standard molecular typing tools. At ASM, Parkhill said that S. aureus "has a very complex pathology. It's a very large and diverse species, and there's no simple association between MLST type or clonal complex and disease — it has a much more subtle link." That the bug is becoming increasingly drug-resistant and that "MRSA, particularly, is spreading throughout hospitals and causing issues," recently prompted Parkhill and his colleagues to sequence 63 S. aureus ST239 isolates — 20 collected at a single hospital during a seven-month period, and the remaining 43 from a global collection recovered between 1982 and 2003.
Among those 63 strains, the team identified a total of 6,700 SNPs — 4,500 of which were in the core S. aureus sequence, Parkhill said. "Remember, these are 63 strains that are indistinguishable by standard typing techniques, and yet we can see four-and-a-half-thousand SNPs in the core genome. So there's a vast amount of variation underlying this apparent identity."
By comparing the genomic variation among strains and constructing a phylogenetic tree, the team was able to track person-to-person transmission within a London hospital that suffered a large, two-year S. aureus ST239 outbreak beginning in 2002, as well as across continents.
Based on the genomic data they generated, published in Science in January 2010, Parkhill and his colleagues were able to "say, very clearly, that this outbreak on the intensive care unit in London was due to a fairly recent introduction from Southeast Asia," he said. "Using this variation — this huge amount of variation we can see underlying the apparent identity — we can start looking at global transmission events. We can track those transmission events and we can link them to human populations ... around the world."
[ pagebreak ]
Researchers in Canada are also using genomic epidemiology to reconstruct outbreaks on a smaller scale. According to Public Works and Government Services Canada, the average Mycobacterium tuberculosis infection incidence rate in the country is 4.7 cases per 100,000 people. But in 2007, at the height of an M. tuberculosis outbreak in a small community in British Columbia, the infection incidence rate in that province rose to 6.4 cases per 100,000 people. By December 2008, the British Columbia Center for Disease Control had identified 41 tuberculosis cases — the cultured isolates of which all showed an identical pattern of mycobacterial interspersed repetitive unit- variable-number tandem repeats, and thus were indistinguishable using standard molecular-typing methods.
Using the traditional epidemiological approaches of contact tracing and social-network analysis to identify the people, places, and behaviors key to the spread of tuberculosis within the community, the BC Center for Disease Control's Jennifer Gardy and her team had pinned down a putative source case, yet found themselves at a standstill. As it became increasingly clear over the course of the team's epidemiological analyses, Gardy said at ASM that "if we had attempted to do an outbreak reconstruction then, using that information, we'd pretty much be stuck, because … it [was] impossible to trace a path of the organism from the source case to all of the different people in that network, because everybody knew everybody else; for any one person in that outbreak, there were three, four, sometimes seven or eight potential sources of their tuberculosis," she said. "And without any sort of detailed or higher-resolution genotyping or genetic data, we had no way of determining who had actually transmitted disease to whom."
And so, Gardy and her colleagues opted to sequence the whole genomes of 36 M. tuberculosis isolates — 32 from the outbreak and four historical isolates. At ASM, Gardy detailed how her team paired the resulting high-resolution molecular data with their traditional epidemiological contact tracing and social-network analysis to construct a putative transmission network.
"Whether you're talking about outbreaks or epidemics, genomic epidemiology is a really interesting, emerging area ... [that] is really giving us a tremendous opportunity to reconstruct outbreaks of disease more accurately than we could in the past," Gardy said.
"In the past, we'd take a combination of field epidemiology and molecular epidemiology — things like genotyping and pulse-field gel [electrophoresis], MLST — and we'd come up with sort of a 'best guess' reconstruction. With genomic epidemiology, we're able to refine those best guesses a little bit further and end up with more detailed pictures of what's going on."
While it may be considered an exercise in benchmarking, Gardy's tuberculosis outbreak reconstruction, which her team reported in the New England Journal of Medicine in February, demonstrated the mutual dependence of genomic and epidemiological data for the most accurate assessment of disease transmission. Gardy's team is now applying its whole-genome sequencing and social-network analysis methodology to another ongoing outbreak in British Columbia.
"I think as we do more and more of these genomic epidemiology studies in different pathogens and in different underlying community structures or underlying social-network structures, we're going to end up building a really interesting knowledge base of: How do pathogens enter populations? And, once they get in, how do they spread? What are the transmission dynamics?" Gardy said. "Developing a knowledge base like this, and understanding how different pathogens behave in different social networks is, I think, really key to evidence-based strategies for the prevention of outbreaks, or the control and management of an outbreak when it does happen."
Surveillance and response
The US Food and Drug Administration estimates a national occurrence of 2 to 4 million salmonellosis cases annually, and says that the incidence of Salmonella infections appear to be on the rise in the country and in other industrialized nations. According to the FDA's Guojie Cao, Salmonella cause approximately 11 percent of food-borne illnesses in the US each year.
[ pagebreak ]
With an eye toward Salmonella surveillance, Cao and his colleagues are sequencing cultured isolates from a variety of past outbreaks, searching for new targets — SNPs, among other things — to monitor and track the food-borne pathogen.
In the New England Journal of Medicine in March, Cao and his team reported their sequencing of 35 Salmonella enterica isolates of the Montevideo serotype — which was associated with a 2009 outbreak traced to red and black pepper used in the production of spiced meats. Of the 35 isolates sequenced, some were derived from the food manufacturer's individual ingredient suppliers, some from people who had who consumed its finished products and became sick as a result, and others from historically and geographically disparate sources, for context.
When coupled to conventional sub-typing approaches, Cao told ASM attendees that sequencing led his team "to a possible responsible facility ... a domestic greenhouse [that] associated with the outbreak source, underscoring the utility of this technique in revealing sub-genotypic differences essential to the trace-back of [the] bacteria in question as they emerged in the food supply."
Cao said the FDA intends to do more outbreak investigations using whole-genome sequencing in the future. "In 2011, we will finish at least another 400 new draft genomes ... of Salmonella," he added.
Over at the Methodist Hospital System in Houston, Nahuel Fittipaldi and his colleagues are using whole-genome sequencing to analyze the culprit behind another recent epidemic in Canada — a newly emerged hyper-virulent clone of a rare emm59-type group A Streptococcus. This epidemic arose in western Canada in 2006 and spread eastward, affecting every province and racking up more than 500 cases through 2009.
Using a population genomics approach based on SNP analysis, Fittipaldi and his team identified "distinct geographical patterns of diversification of epidemic strains" that correlated with geographical case incidence across Canada, he said at ASM. With that, the researchers sought to determine the nature of the emm59 strain's evolution during an epidemic, with the goal of monitoring it in real time.
But as Fittipaldi's team was wrapping up its genomic epidemiology investigation of the Canadian epidemic clone, it became "aware of a series of infections in the northern US state of Montana, which shares borders with the western Canadian provinces most affected by this epidemic." The team quickly attributed the source of those infections in Montana to Streptococcus emm59, he said. "We received the samples in our lab in Houston and within 10 days we were able to show that the cases of invasive infection in Montana were caused by members of the Canadian epidemic clone," Fittipaldi said. "You could call this near real-time full-genome sequencing."
Going forward, Fittipaldi said that "an integrated systems biology strategy [will be] critical to an enhanced understanding of epidemics, and the real-time component is useful." He added that sequencing multiple whole genomes of bacterial strains will be increasingly necessary for "understanding strain diversification, evolution, and geographic dissemination of highly clonal bacteria causing epidemics."
The 'method of choice'
According to Birmingham's Pallen, researchers have been using 19th century techniques to face the 21st century challenge of diagnostic micro-biology. "Much of what we do relies on microscopy and culture techniques which date to the time of Pasteur," Pallen said at ASM. "Isn't it about time, now, that we started ... to use high-throughput sequencing as the method of choice for the diagnosis of infections?" he asked.
As genomic epidemiology investigations have hinted at a possible diagnostic scheme in which the "genome sequence is the primary source of data," Pallen said that researchers might eventually move beyond "the complex phenotypic tests that people do, and just stick to looking at genomes. That's a pretty heretical view, but I wonder if we come back in three years' time and it might not be so heretical."
London's Spratt was also enthusiastic about the future of genomic epidemiology. "We have the real prospect of routinely obtaining the sequences of isolates of bacterial species in near real time at a reasonable cost. This really raises new possibilities," Spratt said. "It's really a game-changer. That's not a word I use very often, but I think most of us would agree it is a game-changer."
Will whole-genome sequencing replace multilocus sequence typing for characterizing bacterial isolates?
16% Yes. Within five to 10 years.
54% Yes. Within the next five years.
21% Maybe. WGS and MLST may prove to be complementary techniques.
4% No. Limited access to WGS will ensure MLST is a mainstay.
5% No. MLST is more useful for certain species than WGS is.