A team from Harvard Medical School and the Allen Institute for Brain Science has developed a DNA nanoball-based scheme for sequencing RNA in situ in individual cells.
The researchers described the approach — known as fluorescent in situ RNA sequencing, or FISSEQ — in a study published online last week in Science. There, they presented proof-of-principle experiments using human fibroblast cells that demonstrated the feasibility of the approach and their ability to pick up expression patterns that fit with known fibroblast functions.
The sequencing method is similar to the DNA nanoball sequencing strategy used by Complete Genomics, Je Hyuk Lee, a research fellow at Harvard Medical School and co-first author, told In Sequence. Lee works in the laboratory of Harvard geneticist and corresponding author George Church, who currently sits on the advisory board for Complete Genomics.
In its current form, the sequencing method — which generates reads up to around 30 bases long — is done directly on a microscope stage and can take as long as a couple of weeks. The study's authors have started collaborating with undisclosed sequencing firms and hope to develop an automated version of the approach in the coming years.
"What we're focusing on is culturing cells or placing tissue sections on a glass slide that can be converted into flow cells that you can just stick into a next-generation sequencing machine," Lee said.
He noted that FISSEQ is expected to be compatible with a wide range of sequencing chemistries. The researchers have already taken a crack at using it in conjunction with the sequencing-by-synthesis, or Polonator, approach developed in the Church lab and have started exploring its compatibility with sequencing-by hybridization approaches.
"It looks like all the [sequencing] chemistries will be compatible," Lee said. "As long as the sequencing chemistry uses fluorophores and imaging, our expectation is that it will be compatible."
While probe-based methods such as fluorescence in situ hybridization (FISH) have become widespread for assessing specific transcripts within a cell or tissue section, interrogating all of the transcripts in an unbiased and non-targeted manner while unraveling their sequence has proven far more difficult, Lee said.
"The challenge is detection, in that there can be up to a million RNA molecules per cell," he said. "Even with the best-resolution microscopes, you can't quantify and image all those molecules inside a single cell and tell them apart."
He credited much of the success of the new method to optimizations that have been done to produce exceptionally bright and stable DNA nanoballs from the complementary DNA strands reverse transcribed from each RNA molecule.
The first advance in that direction came a few years ago when members of the team found a way to incorporate modified uracil nucleotides during reverse transcription and cDNA production — an advance that helped in cross-linking the resulting DNA nanoballs into a stable matrix or "hydrogel."
Each nanoball is comprised of single stranded cDNA studded with modified nucleotides cross-linked with a biologically inert molecule called BS(PEG)9. That step more or less "fossilizes" the cDNA amplicons in place in their cellular environment, Lee said, while diminishing background interactions.
That leads to "incredibly bright and incredibly stable" nanoballs, he explained, noting that the nanoball matrix can withstand hundreds of hot and cold wash cycles without breaking down or shifting.
Because the cross-linked collection of nanoballs is also permeable to the solutions needed to prepare and sequence the molecules, the entire RNA sequencing reaction can be done on a microscope slide, he added.
With cross-linking approaches in hand, the researchers spent several more years finding ways to stretch out the length of sequence reads in situ, taking cues from the sequencing-by-ligation strategy employed in SOLiD instruments from Thermo Fisher brand Life Technologies.
As the approach currently stands, the team can generate RNA sequences up to 30 bases long using any microscope with the capacity to distinguish between the four colored probes coinciding with each nucleotide.
"Having the flexibility of all these microscopes to look at the sample in many different ways actually has a huge benefit, even though it's slower, in terms of our method," Lee said.
"But we certainly do want to take advantage of the speed and the throughput of next-gen sequencing machines," he said. "Our hope is that there will eventually be next-gen sequencing machines that are also customizable with different magnifications, different imaging modalities, and such."
In their published experiments, the researchers snapped confocal microscope images of cells after each cycle to define the location of each fluorescent pixel and its corresponding nucleotide.
Those experiments showed that the DNA molecules generated from the RNA remained intact through the whole experimental sequencing cycle — a process that can currently take up to two weeks.
They used FISSEQ to sequence the RNA in individual human primary fibroblast cells, generating almost 15,000 sequences that were longer than five pixels apiece. Those amplicons coincided with some 4,171 genes, with more than 90 percent of the apparent transcripts mapping back to the annotated DNA strand.
Among the top 100 most highly expressed transcripts detected were those generated from genes with known fibroblast-related functions, the researchers reported, such as fibronectin and collagen.
When they did FISSEQ on fibroblast cells after scratching the surface of a coated slide to induce a wound-healing response, the investigators detected thousands more transcripts, including known contributors to injury repair in a subset of the cells.
Across some 40 fibroblast cells, the group saw transcripts from more than 8,700 genes, despite purposely dialing down the amplicon density to speed up the sequencing process, Lee said. "Even though we only had several hundred per cell, across 40 cells we had enough information to make all these biological findings."
The approach had another unanticipated advantage, too: a drop in representation by transcripts from housekeeping genes, leading to a relatively robust representation of transcripts related to a sequenced cell's function.
"In normal RNA sequencing, people have to do a lot of analysis and compare their datasets to other cell types or other datasets," Lee said. "But in our case, our method only allows for the amplification of genes that are very relevant to the cell's function."
That simplifies informatics in some respects, he explained, because researchers can focus on a set of highly expressed and biologically relevant genes within a given cell without having to weed out such housekeeping RNAs.
Along with existing software such as Bowtie, which is used to call and align reads to human reference sequences, the team came up with new computational methods to deal with background noise related to the proximity of DNA nanoballs to one another within the cell.
In particular, the researchers settled on an approach that assigned a nucleotide sequence to every single pixel in an image. By aligning each to a reference sequence, they were able to fairly easily distinguish between authentic reads and background fluorescence using that approach, Lee said.
"Because [the reads] are 30 bases long, the probability of random noise or auto-fluorescence or background debris … actually aligning to the RefSeq library is very, very low," he explained.
After tossing out mismatched reads, the team further validated their reads by looking at whether or not each read fell near other reads with similar sequences, as would be expected for transcripts in the cell.
With the help of X and Y spatial coordinates for each read, the researchers can apply statistical analysis and imaging "masks" to localize each transcript to a particular organelle or cellular compartment.
Eventually, the group aims to have web software to scroll through images of cells in a given tissue to see where various RNA transcripts turn up and how this distribution relates to known functions for the genes that code for them.
At the time the study was submitted for publication, the researchers were still developing ways of aligning three-dimensional images from each cell. To simplify the analytical side of the analysis, they carried out the experiments described in the paper on fibroblast cells, which are relatively flat, Lee explained.
"We decided to look for … a cell that's relatively flat, so we can make a simple file of [two-dimensional] images from [three-dimensional] images," he said, noting that three-dimensional alignments of similar data are on the horizon.
If findings from the fibroblast study are successfully replicated in other cell types, the team suspects that RNA sequencing in situ may be a means of quickly defining cell type-specific transcriptome signatures in a non-targeted manner. If so, that would offer a means of defining a cell type or disease state without relying on cell morphology or the use of targeted markers.
For his part, Lee is optimistic about the prospect of using FISSEQ to quickly narrow in on transcript signatures for various cell types in a fast and unbiased manner, both for healthy and diseased cells. In the case of tumor cells, for example, he argued that it might be possible to use the approach to define and characterize tumor cells without relying on histological methods alone.
Because the method produces other expression and sequence data at the same time, it should theoretically be possible to simultaneously detect gene expression profiles and informative mutations for individual tumor cells from the same dataset without losing contextual data.
"Clinically, the fact that we can now define what a cell type is means that we can also define what the tumor type really is, not just by histological methods," Lee said. "At the same time, we're defining these other mutations that could be important for target-specific therapies."
He noted that ability to sequence RNA in situ may also find favor for those doing research on healthy tissues, including the types of brain cell samples currently assessed using targeted approaches such as FISH.
Lee said the team has already started to explore that possibility experimentally, and that it plans to generate data sets from various cell types and compare them with expression profiles that have already been validated through efforts such as the Allen Brain Atlas set or embryo gene expression mapping projects.
The group's current collaborations with next-generation sequencing companies are aimed at developing faster and more automated versions of the in situ RNA sequencing scheme that would make such clinical and research applications more feasible.
Lee expects the first commercial offerings in this space to be proof-of-concept instruments that involve as few tweaks to existing hardware and software as possible. Further down the road, he noted that such development could theoretically lead to completely overhauled instruments.
"I predict that when people start generating data themselves they'll see a need for that and that type of specialized instrument will come down the pipe," he said.
The Harvard team has already filed multiple patents related to the in situ RNA sequencing method.