NEW YORK – GranatumX, a new data analysis portal developed by researchers at the University of Michigan, may help make single-cell sequencing analysis more user friendly for researchers and provide bioinformatics tools developers a platform to get their methods more widely adopted.
"Usually, those two communities don't interact, they're kind of disjointed," said Lana Garmire, whose group developed GranatumX. "We're hoping this is the tool where they can connect."
Garmire's team described its platform in a BioRxiv preprint posted last month, which they have also submitted for publication in a journal. The software platform, which can run on private servers or on the cloud, packages bioinformatics tools into plugin modules, so-called "Gboxes," so they can run in the system no matter what language they were originally written in.
In addition to designing for user-friendliness, Garmire said her team aimed for it to be efficient. A full analysis workflow for 10,000 cells takes as little as 12 minutes, she said.
Earlier this month, GranatumX was approved to be included in the Human Cell Atlas (HCA) Data Portal's analysis tools registry. "HCA doesn't endorse any specific programs," Samantha Wynne, the project's scientific communications manager, said in an email. But the approval does mean that GranatumX has met certain requirements, such as supporting standard data formats, and is free and open source.
While it comes with preset analysis workflows for researchers new to single-cell sequencing, it's meant for people with and without programming experience. One target user is the principal investigator or senior scientist who wants to spend at least a little bit of time analyzing the data themselves.
One such user is Larry Reiter, a stem cell researcher at the University of Tennessee Health Science Center who recently waded into single-cell sequencing of the dental pulp stem cell lines he has collected.
"It allows you to construct your own analysis," Reiter said. "I was able to get to the point where I could get reasonable [single-cell RNA-seq read] coverage plots." The GranatumX team even made tweaks in response to his needs, going "above and beyond to help with our analysis," he said. "What we are doing is unique in the single-cell space, but they took on the challenge and allowed us to analyze our data easily, making the comparisons we needed."
While GranatumX is new and Garmire does not yet have visibility into how many people have used it, the software it's based on, Granatum, an interactive web tool for single-cell sequencing analysis, was accessed approximately 10,000 times, she said.
Published in 2017 in Genome Medicine, Granatum had a restriction to the number of concurrent users and had a "lack of flexibility," Garmire said. "The original program was a series of preset steps for users to go through; they really didn't have any choices." Her lab started planning a more open platform in 2018. Overall, two generations of about eight to 10 developers have worked on GranatumX.
The ability of researchers to put together their own workflows is helped by the Gbox concept, which also opens the door for developers to participate. "You just need to write your module into a Gbox and then deposit it," Garmire said, noting that her team is open to working with interested developers. "Basically, our time is free for them. We can consult with them, or guide them," she said. "The window is open."
Garmire also strove to make analyses reproducible. The software can provide a 20-page report at the end of the analysis covering what parameters were used. "You'll see every single step," she said, adding that GranatumX is also designed with collaborations in mind. It employs project keys that can be shared so select scientists can look at all the data and results.
The software starts with processed data and metadata of read counts from single-cell sequencing. "It's very generic, all you need is to provide some data matrices," Garmire said, which can be taken from any of the single-cell platforms, including 10x Genomics' Chromium.
GranatumX is especially useful for data preprocessing, Garmire said, including data normalization, removing batch effects, and filtering genes for dropouts, or low expression values. The software offers access to her lab's own deep learning-based imputation tools.
It can also perform downstream functional analysis, such as differential expression tests and cell clustering, including visualization based on PCA, t-SNE, or UMAP plotting. Even protein-protein interaction network analysis and cell state-related RNA dynamics are options.
In a case study described in their preprint, Garmire's team used a 10x Chromium gene expression dataset of 7,431 cells from a patient with metastatic Merkel cell carcinoma, treated using T cell immunotherapy as well as immune checkpoint inhibitors. They identified seven clusters with uMAP plots and showed the significance of immune-related pathways. They also used GranatumX to analyze the Tabula Muris dataset, which contains 54,865 cells from 20 mouse tissues and organs.