By Monica Heger
Despite the technical challenges of single-cell sequencing, researchers have begun using the technique to better study single bacterial species from metagenomic samples.
Presenting last week at Cambridge Healthtech Institute's XGen Congress in San Diego, Calif., William Gerwick, professor at the Center for Marine Biotechnology and Biomedicine at Scripps Institution of Oceanography, showed how he has used the strategy to study certain species of cyanobacteria that produce anticancer compounds.
His talk was one of several at the conference on techniques to improve single-cell genome assembly and to increase genomic coverage from single-cell sequencing, currently the main challenges of the technology.
Gerwick is using the technology to study single cells of marine cyanobacteria in order to identify the gene motifs that encode for natural products used in pharmaceuticals. Between 60 percent and 70 percent of all drugs have an origin in natural products, he said, and the marine environment has proved an especially fruitful source of those compounds.
"The marine environment has yielded nine compounds that are approved in the US or Europe," he said, six of which are cancer drugs. Additionally, the National Cancer Institute has estimated that about one in 15,000 tested compounds become drugs, but compounds from the marine environment have about a six-fold better success rate at one out of around 2,450 tested compounds, he said.
Gerwick has focused on marine cyanobacteria because the organisms have large genomes that are "very rich in secondary metabolites," he said, and those metabolites "possess tremendous structural diversity."
However, the cyanobacteria are surrounded by a sheath that is "deeply infiltrated by heterotrophic bacterial growth," Gerwick said.
Despite attempting various techniques to separate the cyanobacterial cells from the sheath — including mechanical techniques, antibiotic treatments, and varying temperature and light conditions, Gerwick's team has not been able to do so without also killing the cyanobacteria.
It could be that the treatments themselves kill the cyanobacteria, or that the heterotrophic bacteria on the surrounding sheath are crucial for survival of the cyanobacteria, he said.
So, he said, the team turned to single-cell sequencing. Using micromanipulation, the team is able to squeeze out several individual cells from the cyanobacteria filament, move them into sterilized media, wash them, and then extract the DNA.
To do whole-genome amplification, Gerwick uses a technique known as multiple displacement amplification, which uses the enzyme phi29 to amplify DNA in a linear fashion. For sequencing, he has tested both 454 and Illumina, and is also considering using one Ion Torrent.
The single-cell sequencing strategy "expands what we can do enormously," he said.
In one experiment, Gerwick's team sequenced the genome from the marine cyanobacteria species Lyngbya bouillonii, with the goal of better understanding how the organism produces apratoxin, a secondary metabolite with anticancer properties.
Gerwick's team extracted four cells, amplified each of the cells' genomes using MDA, and sequenced them using 454 sequencing using two different sequencing strategies: the team sequenced the amplified DNA from individual cells, and also pooled and sequenced amplified DNA from all four cells.
Gerwick said combining amplified genomes from multiple cells helped to increase genome coverage and reduce biases introduced by MDA. Sequencing on the 454 yielded average reads of about 250 bases, and de novo assembly using Newbler produced around 3,500 contigs greater than 500 base pairs for a total coverage of 6.6 megabases.
However, due to the biases produced from MDA, the Newbler assembler was not able to achieve longer contigs, so the team used a second assembler, Euler-SR, to extend the initial contigs and then merged the two assemblies, which allowed the team to assemble 34 contigs between 10 kilobases and 42 kilobases, covering between 71 percent and 92 percent of the genome, based on an estimated genome size of 7.1 megabases to 9.1 megabases.
The assembly allowed the team to identify several contigs from the apratoxin pathway, Gerwick said, including the 57-kilobase gene cluster that is responsible for producing apratoxin.
Gerwick said he has also used the MDA technique in combination with both 454 and Illumina sequencing to study other cyanobacteria, including from the genus Moorea. Single-cell sequencing of that species revealed eight gene motifs that coded for natural products, five of which were novel. He said that he is now following up on this study, trying to isolate the actual compounds that the motifs produce and further study their functional properties.
Reducing Bias, Increasing Coverage
Despite the challenges of single-cell sequencing, Gerwick said for these studies it was preferable to metagenomic sequencing. While a metagenomics study may have identified the same gene motifs, it would not have provided evidence that they were associated with the specific cyanobacteria.
"We want to know what cell types these gene motifs associate with," he said. "One longstanding and perplexing issue is that a lot of really interesting and exciting compounds have been isolated from various classes of marine organisms — sponges, tunicates, coral, algae — but there still remains a fundamental question on who's really making these compounds," he said. "It's very tough to figure that out." But single-cell sequencing can help sort that out, since it looks at the genome of one cell from a single species.
Gerwick said his team is now "continuing to explore other genomes of cyanobacteria that produce biologically active and structurally interesting natural products," and is using Illumina sequencing and considering newer sequencing technology such as the Ion Torrent.
He is also exploring improved strategies for single-cell sequencing that will help increase genome coverage and reduce the biases introduced in the whole-genome amplification step.
Other talks at the XGen conference also covered efforts to improve single-cell sequencing methods. For example, Pavel Pevzner, a professor of computer science at the University of California, San Diego, discussed his work on new assembly methods that would improve genome coverage from single-cell sequencing.
While he did not want to provide details because the method will be published soon in a peer-reviewed journal, he said that one main problem with using traditional assemblers for single-cell sequence data is that the assemblers make decisions based on coverage, so will throw out reads with low coverage.
However, he said, in single cell projects, some correct reads may have low coverage, while some incorrect reads may have high coverage due to the biases introduced from amplification.
The algorithm he is developing does not use coverage in its assembly decisions, he said, but is instead based on paired k-mers.
Meantime, a research team from Los Alamos National Laboratory is working on a method to induce ploidy in bacterial cells prior to MDA and sequencing, which boosts mapped genome coverage by about 10 percent, and coverage from de novo assembly by around 18 percent.
Gerwick said that he would consider trying both of these methods. Combining them, he said, could theoretically yield around 95 percent sequence coverage.
Earlier this month, researchers from BGI reported on a single-cell exome sequencing strategy, also using MDA, but are now looking to combine MDA with another whole-genome amplification known as degenerate oligonucleotide-primed PCR, which results in less genome coverage than MDA, but also reduced amplification bias (IS 3/6/2012).
Have topics you'd like to see covered by In Sequence? Contact the editor at mheger [at] genomeweb [.] com.