NEW YORK (GenomeWeb) – Researchers from the UK, Belgium, and US have developed an automated scheme for sequencing DNA and RNA from the same individual cells — a method dubbed Genome and Transcriptome Sequencing, or G&T-seq.
As described online this week in Nature Methods, the G&T-seq approach involves first separating each cell's genomic DNA from its messenger RNA transcripts before taking both types of molecules forward through parallel sample preparation and sequencing steps.
The team behind G&T-seq expect the method to find favor among those interested in profiling single-cell features, such as transcript splicing, transcript fusions, or gene expression patterns on the transcriptome side, while simultaneously getting a glimpse at copy number profiles or DNA base-level sequence characteristics in a relatively high-throughput manner.
"Our method lends itself very easily to automation, which means that the number of cells we can process already, from the start, is in the hundreds," the study's first author Iain Macaulay, a researcher with the Sanger Institute-European Bioinformatics Institute Single-Cell Genomics Centre, told GenomeWeb.
In their proof-of-principle demonstration of the approach, he and his colleagues did G&T-seq on more than 200 mouse or human cells, picking up patterns that would not have been clear from single-cell RNA or DNA sequences alone.
In particular, they saw gene expression shifts in cells that had gained or lost corresponding chunks of sequence through chromosomal missegregation — a pattern they verified by sequencing DNA and mRNA in individual cells from a mouse embryo with chromosomal abnormalities.
The G&T-seq method differs from an approach described in Nature Biotechnology earlier this year that uses so-called quasilinear amplification to prepare cells for simultaneous DNA and mRNA sequencing.
In contrast to that approach, known as DR-seq, the G&T-seq method hinges on physical separation of DNA and mRNA prior to amplification of each, Macaulay explained. The DR-seq developers "went with a strategy where they did everything in one tube, whereas we separate the DNA and RNA," he said.
Both methods are well suited to assessing gene expression and copy number profiling from individual cells, he noted. But because it physically separates DNA and mRNA from the outset, he argued that G&T-seq is more flexible when it comes to applying alternative DNA amplification methods centered on amplification enzymes with better base level resolution — a necessity when trying to profile SNPs and other fine-level details from single-cell sequence data.
"Even with our method, we have to make a decision at a certain point about whether we want copy number or whether we want to do whole-genome or targeted sequencing," Macaulay noted. "We choose which enzymes to amplify the genome with."
The team starts by sorting single cells directly into a lysis buffer, where each cell bursts, releasing both the DNA and RNA. The mRNA is then pulled from solution using magnetic beads coated with an olig-dT primer that grabs the molecules by their polyadenylated tails.
That primer also helps in initiating reverse transcription of the mRNA at a later stage of the protocol, Macaulay noted. Before that, though, he and his colleagues carefully move the captured mRNA to one side of the tube with a magnet so the free DNA — which remains in solution in the lysis buffer — can be removed from the mix by pouring off the supernatant.
Following several wash steps, the researchers prepare the mRNA and DNA separately, taking care to maintain as much of the starting material as possible.
In the case of the mRNA, the team does reverse transcription after re-suspending mRNA from the magnetic beads. From there, the amplification and single-cell RNA-sequencing steps resemble Smart-seq2 protocol with a few modifications, Macaulay noted.
To take the DNA through to sequencing, meanwhile, the researchers precipitate it out of the wash solutions and resuspend it in a smaller solution of whole-genome amplification reagents.
At that point in the protocol, researchers have the option of selecting the amplification approach that best fits the level of resolution required from the experiments, Macaulay explained. "For a lot of things, we're just interested in copy number and very shallow sequencing. But if you want to call SNPs and things, it's obviously a bigger investment in terms of sequencing as well to get that deep."
"If you're just interested in copy number, you'll use a protocol such as PicoPlex — like we've used when we do copy number in this paper," he said. "If you want to look at single nucleotide variants, the best way to go is to use the Phi 29 polymerases to do a multiple displacement amplification."
While the Phi 29-based MDA approach is not ideal for amplifying DNA in copy number profiling experiments, the enzyme's effective DNA proofreading activity makes it less likely to muddy the view of base-level sequences and SNPs.
The team has established an automated system — based around a robot and 96-well plates — that makes it possible to assess around four plates in roughly three days, though Macaulay noted that a core facility at the Sanger Institute plans to scale up G&T-seq automation and throughput even further.
So far, the team has used G&T-seq in combination with Illumina or Pacific Biosciences instruments, though Macaulay said the same approach should be compatible with other sequencing technologies as well.
As with the DNA amplification protocol selected, though, he noted that investigators may opt for different sequencing technologies depending on the research question at hand.
For example, he and his team have found that the long reads offered by the PacBio provide a slight advantage when looking at transcript splicing or fusion transcripts, since individual sequence reads are roughly the same length as many of transcripts being sequenced.
On the other hand, the researchers have used Illumina HiSeq instruments to do whole-genome DNA sequencing on a few of the cells, producing deep sequence data that made it possible to start looking in more detail at SNP patterns and sequences at particular gene fusion sites.
To get gene expression and/or large-scale DNA gains or losses in individual cells, Macaulay and his colleagues have been aiming for between 2 million and 4 million reads per cell, running each 96-well plate on two lanes of an Illumina HiSeq instrument. Deeper coverage is needed to delve into details of transcript or DNA sequences, he explained.
"There are a lot of options of things you can do with this," Macaulay said, noting that the DNA-mRNA separation raises the possibility of applying non-sequencing analyses to one of the molecules.
For instance, he and his team have already started on one project that is using G&T-seq to sequence RNA, while DNA is genotyped using a quantitative PCR-based method.
There are still situations where single-cell RNA-seq alone may be sufficient to solve the research question at hand, Macaulay explained, particularly in situations where the cells come from a tissue or cell line that's expected to have a stable genome sequence.
He noted the random dropout of some regions in the genome is still slightly higher when doing single-cell DNA-seq with the G&T-seq protocol than it is in conventional single-cell DNA-seq experiments.
"The dropout from single-cell whole-genome amplification is already a problem," he explained. "So it's really not a big difference, but there's slightly bigger dropout to get the transcriptome as well."
He and his team are exploring ways of reducing that dropout. They're also interested in trying to make G&T-seq compatible with bisulfite sequencing, which would make it possible to profile cytosine methylation patterns in individual cells.
In their proof-of-principle study of the current G&T-seq approach, researchers illustrated some of the method's potential applications, investigating everything from gene expression and copy number profiles detected by low-coverage sequencing to fusion transcripts and SNPs found by PacBio long reads and/or deep sequencing.
Using 86 single cells apiece from a breast cancer cell line and a lymphoblastoid cell line from the same individual, for example, the team used G&T-seq to look at gene expression and low-coverage DNA sequence-based copy number profiles in parallel in each cell.
A handful of the cells from those cell lines were all sequenced to high coverage with the Illumina HiSeq X instrument to see just how much of the genome was represented, on average, in a given cell.
In a subset of the cancerous cells, DNA sequence data revealed an extra copy of chromosome 11. Other breast cancer cells had lost or gained part of chromosome 16. Adding in mRNA sequence information from the same cells, the team got a view of expression changes coinciding with these chromosome gains.
G&T-seq on cells from a mouse embryo forced into chromosomal missegregation verified the rapid expression shifts that can follow such gains or losses. Even so, the team noted that experiments on induced pluripotent stem cells from individuals with an extra copy of chromosome 21 point to person-to-person transcriptional differences in addition to the broadly elevated expression of chromosome 21 genes.
Macaulay said the team is now considering experimental designs for getting a detailed picture of the gene expression consequences that follow more subtle copy number changes.
They have a study underway using G&T-seq to examine early embryological development in mice using individual cells taken from pre-implantation embryos and are collaborating with cancer researchers to learn more about the relationships between copy number changes, gene expression patterns, and mutation profiles in individual cells from several cancer types.
"We have a few collaborations lined up where we're looking at different cancers, some of which should lend themselves very nicely to the whole-genome copy number calling and expression," Macaulay said.