Researchers at the RIKEN Center for Life Science Technology have published a paper in Nature Biotechnology that describes Zenbu, a free bioinformatics tool for integrating, processing, and visualizing transcriptome data from RNA-seq, ChIP-seq, cap analysis gene expression, and other experiments.
Zenbu was developed for use in large-scale transcriptome projects such as the RIKEN-led Functional Annotation of the Mammalian Genome (FANTOM) consortium, a research effort first established in 2000 to assign functional annotations to full-length cDNAs that were collected by the Mouse Encyclopedia Project run by RIKEN. The project has been expanded over time to cover transcriptome analysis more broadly.
According to the paper, Zenbu — a Japanese word that translates as "everything" or "all" — offers a "rich interactive visualization experience via native embedded processing" that improves on the abilities of existing browsers such as the UCSC Genome Browser and the Integrative Genomics Viewer, which "provide static visualization of pre-computed data files." It also improves on "queue processing systems" such as Galaxy, "that still need to precalculate results via wrappers around external programs," the researchers wrote.
The system has three components. The first is a genome browser that lets users visualize and interact with their data; the second is a secure system for uploading and sharing research data publicly or with selected users; and the third component is an interface through which users can explore and query the data already available in the system and analyze it using the browser. Zenbu has datasets from both the Encyclopedia of DNA Elements and FANTOM consortia.
It includes tools for "quality filtering, signal thresholding, signal normalization, peak finding, annotation, collation of signal under peaks or transcript models, and expression difference visualization across multiple experiments," the paper's authors wrote. It also offers "predefined views and data-processing scripts" that are optimized for RNA-seq CAGE, short-RNA and ChIP-seq experiments, and can be used by researchers with limited informatics experience. This includes scripts for steps such as data normalization, filtering, clustering, and collation. More experienced users have the option to combine scripts to produce more complex data processing pipelines.
Jessica Severin, a senior technical scientist at RIKEN and one of the authors on the paper, told BioInform that she and her colleagues began developing the system four years ago. At that time, most genome browsers only allowed users to visualize small quantities of data in their tracks and provided limited information on genes and gene annotations. "When you start to work in the transcriptome or RNAseq space, you have almost 1,000-fold [the] amount of data that you [might normally] be working with because of all the different tissues, and cells, and conditions," she said. "When I went around to rebuilding the genome browser, I was thinking in terms of 1,000-times more data."
Furthermore, although sequencing-based studies used to be done at large academic institutes and research centers, there are now many smaller-scale research collaborations as sequencing technologies have become cheaper and more accessible, Severin said. Existing browsers were developed with those larger centers in mind, but with an increase in the number of smaller collaborations, "I saw an opportunity here to redo the genome browser [in a manner similar] to Facebook," she said. "The idea was that you'd have a good piece of software and then the community could put their own data there and maintain it and it wasn’t just up to the [big] center [to] curate the data."
In the intervening years since her team began working on Zenbu, developers of existing genome browsers have updated their systems to address some of the limitations that Severin and her team have focused on. But there are still a few features that set Zenbu apart from existing systems, she told BioInform.
The main difference is that it lets users combine transcriptome data from thousands of experiments simultaneously and display it in a single track. The data are "summed up together so [that] you see a combined signal of 1,000 experiments," she explained.
Clicking on regions of the genome within the browser launches an expanded view that shows the data from each experiment separately enabling interactive exploration of the different peaks in the data. For example, "you might see two peaks near the start of a gene and if you select one, you might see that it's actually enriched in brain tissue but if you select the other one, you might see that it's actually enriched in blood," Severin said. Because Zenbu lets users make this sort of fine-grained distinction, it’s a more attractive option than the alternative, which would be to reanalyze the data, a process that could take several days for very large experiments, she said.
Moving forward, Severin and her colleagues will work on making it easier for researchers to use the system and to upload their data into it. One of the ways they plan to do this is by setting up connected Zenbu mirror sites with collaborators from the FANTOM5 project in the UK, Europe, and Australia, as well as with other groups interested in partnering with the RIKEN team, she said. They also plan to develop additional processing capabilities such as new statistical methods, as well as new visualization capabilities, Severin said.
Researchers can use the web-based version of Zenbu or install and run a local instance of the software in their laboratories.