NEW YORK – While still no doubt a cutting-edge technology, single-cell proteomics has over the last year or so seen increasing levels of interest and uptake, with researchers and vendors alike moving into the space.
Instruments like Bruker's timsTOF SCP and sample preparation systems like Cellenion's CellenOne platform have helped advance the field by providing scientists with tools specifically intended for single-cell analyses. Now, some observers suggest, the data analysis side of single-cell proteomics has some catching up to do.
"There have been tremendous advances in terms of the experimental part [of single-cell proteomics], but I would say that the data analysis part has been lagging a little behind," said Laurent Gatto, associate professor of bioinformatics at the Catholic University of Louvain.
Gatto noted that, to an extent, this was understandable, as an emerging field like single-cell proteomics typically must show that it can produce enough data to make it worth researchers' time to invest in analysis tools.
With single-cell proteomics now generating large and biologically interesting datasets, the time has come for more attention to the informatics side of the equation, Gatto suggested.
At the VIB Next-Generation Protein Analysis and Detection conference last month, Gatto gave a presentation on the current challenges facing single-cell proteomics data analysis. He and his colleagues are also in the process of submitting a preprint looking at the informatics workflows used in several single-cell proteomics studies produced in recent years. He noted that one of their main observations is that each lab uses their own data-analysis pipeline and that when these different pipelines are applied to different single-cell datasets they produce different results.
"Sometimes there is still a coherence in the results, but the results are clearly different … so different that you might even wonder if they come from the same dataset," he said. "The conclusion from that is that the way the data is processed does have an impact [on the data] and potentially has an impact on the [biological] conclusions."
Gatto added that this issue is further complicated by the fact that in many papers, the details provided on the data analysis pipeline are not complete enough to enable outside researchers to recreate it.
"Every lab has their way to analyze the data, but the way they analyze the data is not necessarily well described," he said. "We tried to systematically reproduce or repeat some analyses, but that was simply not possible because the information on what happened with the data was not there."
Gatto noted that the data analysis issues facing single-cell proteomics are largely similar to those facing bulk proteomics experiments, but that they are exacerbated by the nature of single-cell experiments.
For instance, two of the main challenges he and his colleagues observed — batch effects and missing protein values across samples — are longstanding issues in bulk proteomics experiments. They are perhaps even more difficult questions for single-cell proteomics efforts, though, due in part to the large number of samples these experiments aim to measure.
For instance, while batch effects are always a concern, they become much more of an issue when moving from the tens or dozens of samples analyzed in a typical bulk proteomics experiment to the hundreds or thousands of single cells analyzed in a single-cell experiment, Gatto said.
Batch effects "have an even bigger impact [in single-cell experiments] because they are bigger and, because we are working with very little material, slight technical deviations will have bigger effects," he said.
Christopher Rose, a senior scientist at Genentech, likewise said that the large number of samples in single-cell proteomics experiments present data-analysis challenges.
"We're running into hundreds, maybe thousands of single-cell measurements," he said. "One of the questions is how you handle that data." This includes normalizing, batch normalizing, correcting across samples to determine outliers, and identifying bad samples, for instance, "but also how do you deal with all the sample data, the data about what the cells are," Rose said.
While having good sample characterization data is important for any experiment, again, it becomes a more significant challenge when collecting it for hundreds to thousands of samples.
"How do we potentially collect cell characterization data, whether it be cell size or if you are using cell sorting data based on some marker?" Rose said. "It's that sort of thing that we aren't dealing with. How do you deal with the metadata and connect it to [the mass spec data]?"
Additionally, the ability to analyze cells at the single-cell level brings into play distinctions that aren't as relevant at the bulk level.
For instance, "people have started to see that things might cluster based on cell cycle," Rose said. "So how do you deal with that?"
He noted that there are tools and structures for collecting and reporting the sort of extensive metadata that would benefit single-cell proteomics experiments but that "proteomics scientists aren't really used to [these tools]."
Metadata "is something that doesn't get enough attention" at the bulk or single-cell proteomic level, said Nikolai Slavov, director of the single-cell proteomics center at Northeastern University and developer of some of the most commonly used single-cell proteomics workflows.
"Oftentimes people deposit their raw mass spec data, but they don't deposit the metadata, and if they don't deposit the metadata, the raw data are useless," he said.
More generally, Slavov said he believes that the single-cell field has in place many of the tools needed to address its data analysis challenges but agreed that various questions have not yet received as much attention as they should.
He raised as an example the need for better characterization of the noise characteristics of single-cell protein measurements.
"There are dozens of high-profile papers for that question for single-cell RNA sequencing," he said. "But there hasn't been a single study to characterize that for single-cell protein measurements."
Slavov said that data analysis work in single-cell RNA sequencing could provide tools applicable to single-cell proteomics.
"Many of the solutions that have been developed for single-cell RNA sequencing can be adapted," he said, though he noted that "adaption doesn't mean direct translation."
"I think there is need for more money, time, and attention in this area," he said. "But it's not something that I foresee being a particularly difficult challenge."
Slavov said he also expects that computational scientists who have been working with single-cell RNA-seq would begin turning their attention to single-cell proteomics as the field produces more data of biological interest.
"When you talk about informatics, there are quite distinct aspects," he said. "One is interpretation of the mass spec data so that we can identify more proteins. And this problem naturally appeals to mass spectrometry groups. Then the other side is: Once you have identified the peptides and proteins … how do you biologically interpret this data?"
"I'm seeing much more interest on that side from really good statistical, computational biology groups who have been actively engaged with single-cell RNA sequencing data," Slavov said. "They are becoming more interested as they see that our datasets are no longer just a few hundred HeLa cells but that we have data on lots of cells from primary human tissues of a much more interesting biological nature."