NEW YORK (GenomeWeb) – When researchers from Weill Cornell Medical College examined the microbial communities that live throughout New York City's subway system, the researchers found that nearly half of the DNA they sequenced could not be matched to any known organism. In addition, the team, led by Christopher Mason, found "molecular echoes" at each of the sites they sampled, such that it became possible to predict which subway station a given sample came from by sequencing the metagenome.
Now, the researchers have formed a consortium, called MetaSUB for Metagenomics and Metadesign of Subways and Urban Biomes, to expand the study to 45 cities worldwide. The goal is to better characterize the microbial communities — to figure out what species are present and where and how cities differ in their microbial make-up — as well as to look for new potential antibiotics and small molecules and to catalog antimicrobial resistance genes and pathogenic species, Mason told GenomeWeb.
"We believe this will represent the beginning of what you could call metagenomic forensics," Mason said.
Researchers in the 45 cities will sample transit sites on June 21 this year to coincide with Ocean Sampling Day — a campaign to sample marine microbial communities every year on the summer solstice.
The transit sites in each city will each be sampled in triplicate, Mason said, and depending on the city, multiple sites may be analyzed. The goal is to sample each site at least once per year for the next five years, Mason said. Outside of this main goal, there are a number of subprojects in various stages of development, Mason said. For instance, in New York City, researchers are analyzing metagenomic samples from the Gowanus Canal in Brooklyn, which was declared a Superfund site by the US Environmental Protection Agency in 2010. Poland also plans to do a separate subproject to analyze microbial communities from toxic sites in the city; Rio is planning a project around the Summer Olympics; and Barcelona is engaging in a citizen science outreach program, Mason said.
Over the next two months, Mason said the consortium will design its standard operating protocol for the entire process from sampling, extraction, sequencing, and analysis.
Mason said the consortium would like to have both a surface sample and an aerosol sample.
One challenge in standardizing the protocol has been the different import/export laws for each country. For instance, Mason said, the group realized that the nylon swab it plans to use to collect surface samples could not be imported into South Korea.
Similarly, restrictions around importing and exporting DNA and RNA vary from country to country, so although it would be ideal to conduct all the sequencing at one centralized location to reduce batch effects, Mason said that there will most likely be four sequencing centers. The four centers will all use similar controls and the same protocols. Some datasets will be able to be processed by multiple centers. "We're still in the middle of ironing out the laws country by country," Mason said.
For library prep, the consortium has primarily been testing Qiagen's QiaSeq FX and Illumina's TruSeq kits. Both work fine, but the consortium has not yet decided which to use, Mason said.
Sequencing will be done on a variety of platforms, Mason said. The majority of the sequencing will be a shotgun approach using Illumina technology, but Mason said the group would like at least a subset of the samples would also be sequenced using long-read technologies, in order to both validate findings on an orthogonal technology as well as to glean additional information from the longer reads.
They plan to use Pacific Biosciences' single-molecule sequencing technology for some samples to get longer reads, and are interested in doing some nanopore sequencing, particularly if researchers are able to set it up directly at the site. Recently, the group started discussing the possibility of using 10X Genomics' linked-read technology, Mason said, and it also plans to do some targeted capture using real-time PCR.
"We're being very platform agnostic," Mason said. "We want a subset of the samples to have independent validation by both an independent technology and by a targeted capture to make sure we validate some of the findings."
Currently, the team is in the midst of comparing around 10 different informatics analysis tools, Mason said, to determine which one, or more likely, which combination of tools to use. In general, the tools tend to be a "tradeoff of sensitivity and specificity," Mason said, so "if you combine tools you get better answers." He said that the consortium is nearly finished with that study.
Once the project kicks off, Mason expects about 10,000 total samples will be analyzed per year at a cost of around $100 per sample, so the consortium will need at minimum $5 million to fund the five-year project. But, he said, that only includes the actual cost to run the experiment, and not personnel costs. The researchers are applying for numerous grants and working with vendors to get reduced reagents and equipment, while some cities are chipping in for costs for some of the subprojects, Mason said.
The researchers hope to test a number of hypotheses throughout the course of the project. For instance, in the original NYC subway study, they found that different subway stations had unique metagenomic profiles. Stations that had been flooded after Hurricane Sandy in 2009 had a large number of Pseudoalteromonas, which are found in marine environments, as well as Shewanella frigidimarina, another marine-associated species previously thought to be associated with the Antarctic.
Also, in the NYC study Mason said the researchers found that the density of subway riders impacted the diversity of microbes at a given station, and the researchers would look to see if this same finding held true globally.
In addition, he anticipates that new species will be discovered, some of which will be native and unique to specific areas of the world.
In particular, the consortium plans to look at the prevalence of antimicrobial resistance genes. "There's never been a complete global snapshot of antimicrobial resistance markers," Mason said. Most studies of antimicrobial resistance genes are in the context of a hospital outbreak or an individual infection. So, it will be interesting to see how prevalent those genes are in public places and whether there is a greater density of those genes in subway stations closer to hospitals.
For some of the samples, Mason said the researchers would do a targeted capture in addition to metagenomic sequencing, to look for specific bacteria and viruses. For example, he said, at some sites, the researchers will test specifically for Zika virus. However, RNA viruses do not survive for long on surfaces so it is unlikely that they will find Zika in any of the samples. "In some ways, it's almost a negative control," Mason explained. "We're hoping to not see it." A more likely finding will be the influenza virus, since it will be flu season when some countries are sampled.
As the project moves forward, Mason said he hopes it will spawn additional subprojects and collaborators. "It will be a hub and spoke model, but we want to make it easy for people join and for the groups to learn from each other," he said.