NEW YORK (GenomeWeb) – A Columbia University-led team has developed a low-coverage RNA sequencing technique for picking up regulatory network changes and expression shifts in cells subjected to screening assays.
The method — known as "pooled library amplification for transcriptome expression," or PLATE-seq — involves barcoding samples at an early stage of the library preparation protocol so that many samples can be pooled. Investigators then perform shallow sequencing on the pooled, barcoded samples, focusing on 3'-ends of genes, before untangling broad expression patterns and regulatory gene activity changes in each original sample computationally.
In a proof-of-principal study published in Nature Communications this week, researchers from the Columbia University Medical Center and DarwinHealth demonstrated the feasibility of the approach in human cell lines, comparing it with TruSeq RNA sequencing and to expression array-based connectivity mapping methods.
PLATE-seq "has a library construction protocol that's a lot cheaper, but also involves sequencing the resulting libraries to a much lower depth than you normally would with conventional RNA sequencing," explained Peter Sims, director of systems biology graduate studies at Columbia and co-senior author on the study.
"Those two things together make the cost a little over 10 times less than conventional RNA-seq," Sims said, noting that PLATE-Seq comes in at a lower price point than gene expression arrays, as well, when the number of samples per experiment is considered.
"We're taking advantage of the opportunity to multiplex with next-generation sequencing — to mix hundreds of samples together in a single flow cell," he explained. "I think that's a major advantage."
The general approach can be used to gauge expression in samples being used for a range of pharmacological assays, functional analyses, sensitivity screens, and so on. For example, Sims and his colleagues are especially keen to pair PLATE-Seq with drug screening assays as a means of estimating gene expression, regulatory changes, and protein activity shifts that accompany a given drug response.
"The reason we developed PLATE-Seq was because we really wanted to be able to conduct RNA sequencing analysis in the context of drug screens," Sims explained. "These screens typically have very large numbers of experimental conditions and so typical RNA sequencing is just too expensive to be applicable in that context."
To evaluate as many samples as possible in parallel in PLATE-Seq, researchers established a protocol that involves lysing cells directly on the same plate used to do the accompanying high-throughput screen. From there, they transfer the samples to an oligo(dT) plate that captures messenger RNAs before they are shuffled on to a third plate, where sample-specific barcodes are introduced during a reverse transcription step.
Once barcoded, the samples continue on through the library preparation protocol. For the sequencing step, the team typically generates between 500,000 and 2 million raw reads for each sample.
Though the approach is theoretically compatible with any next-generation sequencing platform, the researchers have for now paired it with Illumina short read sequencing, particularly since PLATE-Seq largely focuses on the 3'-end of each messenger RNA.
"Our libraries probably contain very long, perhaps full-length, cDNAs. So it's possible that something like a PacBio or nanopore sequencing could get more out of a PLATE-Seq library than an Illumina sequencer," Sims said. "But our focus is on essentially maximizing the number of samples we can process per dollar, so for our particular application area — which is mostly perturbation screens — you want it to be as cheap as possible."
On the computational side, the team tapped into the VIPER algorithm that co-senior author and fellow Columbia researcher Andrea Califano developed to untangle regulatory networks from large data sets, including interactions between transcription factors and their targets.
Based on the information housed in such networks, the researchers reasoned that they could develop an experimental protocol for producing inexpensive, low-depth RNA sequencing data that could be paired with VIPER to tease out gene expression and regulatory changes based on knowledge of the broader network, Sims explained.
"[VIPER] goes really nicely with PLATE-Seq," he said. "Because if you know the targets of a particular regulator, you don't need to detect that regulator to infer its activity: if you can detect even a subset of its targets with a low-coverage method like PLATE-Seq, you can infer differential activity of that regulator."
The regulatory networks used to interpret the data with VIPER may be inferred computationally or it might also be interpreted with regulatory data generated experimentally through chromatin immunoprecipitation sequencing on several known regulatory molecules and/or a range of other possible experiments.
In their new study, for example, Sims, Califano, and colleagues applied PLATE-Seq to BT20 human cells that had been exposed to seven small molecule treatments, including the chemotherapy drugs crizotinib and gemcitabine. There, they reported, PLATE-Seq uncovered roughly three-quarters of the transcripts identified with the pricier more time-consuming TruSeq RNA sequencing method.
That comparison hinted that each of the RNA sequencing approaches led to comparable gene expression signatures and VIPER-predicted protein activity profiles, while the team's experiments in another human cell line suggested that PLATE-Seq also offered advantages over array-based expression profiling in the cell screening setting.
Even so, Sims cautioned that the strategy is poorly suited to more fine-scale expression analyses, such as efforts to assess transcript splicing analyses or the expression of specific gene isoforms.
The researchers are continuing to make improvements to the method with an eye to bumping up the sensitivity and scalability. They have already applied the approach to several screens requiring roughly eight to 16 96-well plates apiece, and are working towards PLATE-Seq experiments that can handle 384-well plates.
"The number of samples per run could increase a lot once we have that working," Sims said. "Most of our drug studies are done in that format, so that would make things easier."
The PLATE-Seq approach is expected to have applications for a wide variety of research applications — from drug screens to more basic biological studies. And Sims noted that they have already taken a crack at using it for tissue heterogeneity studies, as well, using it to compare expression in multiple core biopsies taken across different parts of the same tissue specimen "to get a spatial map of the tissue in expression space for pretty cheap."
"We're detecting north of 10,000 genes per sample," he said, "and that's a lot of information."
Columbia University Genome Center's high-throughput screening facility currently offers PLATE-Seq, in combination with screening, as a service for other researchers. And DarwinHealth, a firm co-founded by Califano, currently has a PLATE-Seq service contract with Columbia to generate data informing the company's RNA-based precision medicine approach.
In an email message, Califano noted that the company is establishing drug perturbation databases for matching patients to specific therapeutic agents using PLATE-Seq, for example. DarwinHealth Chief Scientific Officer Mariano Alvarez co-authored this week's Nature Communications paper with Sims and Califano.