Researchers from the Translational Genomics Research Institute, the Technical University of Denmark, and elsewhere used whole-genome sequence typing to retrace the evolutionary history of a methicillin-resistant Staphylococcus aureus strain currently found in livestock back to its antibiotic-susceptible roots in humans.
The approach, which may also find use in predicting the future trajectory of MRSA strains, highlights the potential power of next-gen sequencing-based genomic epidemiology. But while plummeting sequencing costs are likely to drive many similar studies going forward, researchers involved in the project noted that the field must overcome a number of challenges related to genome finishing and bioinformatics before NGS-based approaches displace current genetic epidemiology methods.
Using whole-genome sequence data on 89 animal and human isolates of an S. aureus strain called clonal complex 398 (also known as CC398 or ST398), investigators put together a phylogeny that reveals much about the history of the strain.
Results of the study, published online last week in mBio, suggest the MRSA strain stemmed from a methicillin-susceptible version of S. aureus that was passed from humans to animals, where it diversified and became resistant to tetracycline and then to methicillin following exposure to antibiotics used in food production.
"Looking at the life history of this bug, there's very strong evidence that it started off as a tetracycline- and methicillin-susceptible strain in humans," TGen researcher Lance Price, the study's first author, told Clinical Sequencing News. "It spread to livestock and it's there that it became resistant to both of these drugs."
"It's also there that we know that, literally, tons of antibiotics are used to raise food animals," he explained. "And bacteria don't take on these genes for nothing — they don't do anything that's not a selective advantage to them."
Now, Price explained, researchers are using information from the genomes to try to predict the strain's future trajectory, screening for variants to keep an eye out for MRSA strains that have regained the ability to move easily between humans.
"It's really nice to see that there is this type of study that has taken a really systematic approach to dealing with livestock isolates as well as human isolates and seeing that transition from humans to livestock and livestock to humans," University of Maryland microbiologist Dave Rasko, who was not involved in the new study, told CSN.
"Seeing that that transmission goes both ways, I think, is one of the important parts of this manuscript," added Rasko, who was part of the team that collaborated with Pacific Biosciences to sequence the European Escherichia coli 0104 outbreak strains and other isolates using the PacBio RS instrument (IS 7/12/2011).
Rasko and his colleagues also participated in a forensics investigation known as Amerithrax, which relied on genomics to track Bacillus anthracis spores back to their source following a series of anthrax attacks in 2001 (GWDN 3/8/2011).
But while such genomic epidemiology studies are becoming more common, researchers say there are still some hurdles to overcome before they are routine.
"It's definitely in its infancy," Rasko said. But he added that "as time goes on it's becoming economically feasible to do these types of analyses, in terms of whole-genome sequencing."
CC398 was identified nearly a decade ago when an infant in the Netherlands — a country with a low rates of MRSA infection overall — became infected with a new MRSA strain, eventually classified as CC398 by multi-locus sequencing typing.
The source of this infection was traced back to the infant's family farm, where investigators found pigs colonized with the strain. Since then, CC398 has been detected in a range of agricultural animals from many more countries.
For the most part, human infections with MRSA CC398 to date have occurred primarily in individuals who come in close contact with livestock. Even so, the bug now accounts for a significant proportion of human MRSA infections in some countries, including the Netherlands, fueling concern that it could start moving more easily from one human to the next.
Given the potential public health impacts of the strain if that does occur, researchers have been keen to understand where CC398 came from and how it became resistant to antibiotics.
Conventional epidemiological methods such as MLST typing or typing at the staphylococcal protein A, or spa, gene fell short for this task, Price explained, since MLST typing was used to define the group and spa gene typing did not offer sufficient genetic resolution to discern the bug's evolutionary history.
In contrast, Price called whole-genome sequencing "the ultimate DNA fingerprint."
"You can always extract, from that whole-genome sequence data, an MLST fingerprint to put it in context with what we already know," he explained. "But to get really fine-resolution epidemiology — and in this case, also, evolutionary biology — you have all the information right there."
Tracing Ancestry
Using the Illumina GAIIx, Price and his colleagues sequenced 89 CC398 isolates that originated from pigs, poultry, cows, horses, and humans in 19 countries.
The team's phylogenetic analyses, based on some of the 4,238 SNPs identified in the genomes, suggest that the ancestral version of CC398 found was a methicillin-susceptible human strain.
During or shortly after the jump to animals, the methicillin-susceptible strain lost a phage called Sa3 that contains genes mediating interactions with the human innate immune system genes and helps in colonizing humans.
But the strain also diversified once it entered livestock. And there it gained genetic elements needed to resist antibiotics — first tetracycline, an antibiotic commonly used in food animal production, and, more recently, methicillin.
Methicillin itself is not typically used in food animal production in the US, Price noted, but beta-lactam antibiotics are, including the broad-spectrum antibiotic cephalosporin, which can select for methicillin-resistant bugs.
Together, the findings are consistent with a role for antibiotics from food production in the evolution of drug resistance in a human pathogen.
"Epidemiologically we provided a lot of information," TGen's Price said. "It's a leap forward for understanding virulence."
Along with the genetic links it provides between drug resistance in bacteria and antibiotic use in agriculture, the study has also provided resources for predicting possible future trajectories of the CC398 strain.
For example, some of the variants identified in the genomes are being applied in PCR-based assays for screening isolate collections to see what the CC398 strain is doing over time in different parts of the world, Price explained.
In particular, researchers are concerned about an isolate from a pig in France that has taken on tetracycline and methicillin resistance sequences, but has also regained the Sa3 phage that helps in infecting humans.
"This is the one that we're afraid can take a new trajectory and be passed from person to person," Price said.
The team is also continuing to do whole-genome sequencing on the livestock-associated CC398 strain and on another new community-acquired MRSA strain.
Moving to Genome Scale
Such large-scale, genomics-based epidemiological efforts and evolutionary studies are becoming more and more feasible as sequencing costs decline, according to Price and Rasko.
For example, whereas reagent costs were on the order of $400 to $500 per genome for isolates sequenced in the mBio study, Price estimated that reagent costs are now closer to $200 per genome and will likely dip to closer to $100 or less in the near future, as researchers move to the Illumina HiSeq 2000 instrument and multiplex close to 100 samples per lane.
At that price, it essentially becomes cheaper to sequence the draft genome of a potential pathogen than it is to do MLST typing by Sanger sequencing, he added. "Sequencing is getting so cheap, so fast that it's much more cost-effective to just go ahead and sequence the genomes."
Rasko's lab is also relying on the Illumina HiSeq for its large-scale bacterial sequencing studies. That team is currently able to sequence 200 to 300 isolates of enteric pathogens such as Shigella or E. coli per 10-day HiSeq run.
"Really, it's that type of throughput that allows you to get the number of samples that you really need to do the proper statistically powered studies," Rasko explained.
Even so, actually closing genomes is still a far more costly and time-consuming prospect. For the mBio study, for example, Price said that the availability of a high-quality reference genome was a tremendous advantage — and something that may not be available to every team interested in doing this type of genomic study.
"One of the big limitations that we face today is that we're really reliant on having a good closed genome for these types of studies," he explained. "What the field really needs is a strong database of these reference genomes so we can use the draft genomes in an epidemiological context."
To that end, Price said he "would be really excited if a robust third-generation sequencing platform was released that would give us longer reads that would enable routine closing of genomes."
Rasko, too, emphasized the importance of having a high-quality reference genome for comparative purposes "so that you know what's true and what's not true."
Whereas most bacterial genome sequencing done over the past 15 to 20 years focused on generating low numbers of high-quality genome datasets, Rasko explained, the advent of second-generation sequencing platforms has allowed for the production of many genomes that are of somewhat lower quality.
"We've sacrificed some of our quality, in terms of complete genomes at really high quality, for an increased throughput," Rasko agreed. "So having that good quality reference to compare to really cuts down on our number of false negatives."
Rasko said he and his colleagues are currently using a combination of second- and third-generation platforms — primarily Illumina instruments paired with the PacBio system — to generate reference genomes in cases where a high-quality sequence is not already available.
"At our institute, what we do is we combine the Illumina sequencing to get us the high coverage with the PacBio, which gives us longer reads," Rasko said. "We end up putting both of those datasets together and generating high-quality, very well assembled genomes."
Along with a reliance on good quality reference genomes, Price noted that the ability to apply genomics to epidemiological studies routinely is still somewhat limited by a lack of appropriate bioinformatics pipelines for dealing with the data.
That is something that he and his group are now working to develop, along with collaborators at the Center for Genetic Epidemiology in Denmark.
An even larger leap in bioinformatics may be needed before bacterial genomics becomes standard in identifying culprit pathogens in a clinical setting, according to Rasko.
"Right now we can generate the sequences fairly quickly and at a fairly cheap cost," he said. "But we don't have the bioinformatics that will link us to the clinical algorithms for treatment of humans."
Finally, the researchers emphasized that complete and integrated datasets are ultimately going to be crucial not only for epidemiological research but also, down the road, in a clinical setting.
For those studying CC398, for example, information on the nature and extent of antibiotic use in food production at each sampling site is important but hard to come by, Price explained.
On the clinical side, meanwhile, genomic epidemiology will likely need to include information about clinical features found in patients infected with a given isolate. "By leaving out some of that [patient] meta-data, we're not going to be able to understand more completely exactly what's going on," Rasko said.
Have topics you'd like to see covered in Clinical Sequencing News? Contact the editor at anderson [at] genomeweb [.] com.