CHICAGO (GenomeWeb) – Researchers at the University of California San Diego School of Medicine and the Broad Institute have created an add-on to the Broad's GenePattern analysis software to make it more amenable to nonprogrammers and to facilitate wider collaboration.
GenePattern Notebook, online since April, is meant to aggregate text, graphics, and computer code for analysis against the 13-year-old GenePattern repository of bioinformatics analysis and visualization methods. It is meant to run on the Jupyter Notebook platform, an open-source web app that facilitates document sharing and the combination of multimedia, equations, and text to support advanced uses like statistical modeling, data cleaning, and machine learning.
"To our knowledge GenePattern Notebook is the first integration of a bioinformatics tool aggregation portal with an analysis notebook environment," the UCSD-Broad team wrote in a paper published last week in the online version of Cell Systems.
Lead author Michael Reich, assistant director of bioinformatics at UCSD, called GenePattern Notebook more than just an extension to the Jupyter platform. "It adds key capabilities that make it amenable to non-programming researchers," said Reich, who worked at the Cambridge, Massachusetts-based Broad from 2004 through 2015 before moving west.
GenePattern Notebook is freely available through a web browser or for installation within a Docker or Python package, and Reich said all components will perpetually remain free and open-source.
The new system has three target audiences: researchers with specific scientific goals, computational biologists looking to develop new algorithms, and the bioinformatics core, who, according to Reich, want off-the-shelf technology to help them run "fairly complex pipelines."
The Cell Systems article said the UCSD-Broad creation was the first to "integrate the dynamic capabilities of notebook systems with an investigator-focused, easy-to-use interface that provides access to hundreds of genomic tools without the need to write code." Heretofore, interactive electronic laboratory notebooks generally have required users to be able to write code to customize their systems.
"Notebook environments model their interface around the annotation of sections of code, and therefore assume that the user is fluent in a programming language such as Python or R. Bioinformatics tool aggregation portals successfully remove the requirement for coding expertise but to date have had limited ability to incorporate the variety of rich text and media formats required to represent the full scientific narrative surrounding each analysis step," the researchers explained in the journal article.
With GenePattern Notebook, users simply input gene-pattern analyses into a Jupyter notebook. "You don't need to write code," Reich said in an interview. Computational biologists, who often do have programming skills, can save time because they don't need to create new code for each project, he added.
The extension added a new way to format text in Jupyter notebooks, following the "WYSIWYG" style of word processors: what you see is what you get, in terms of highlighting, boldface, italics, and similar styles, Reich said. "That's a big issue when it comes to bringing the world of biological research into notebooks," he explained.
"The GenePattern Notebook functionality takes the Jupyter Notebook interface one step further, adding analysis, login, and rich text input components that present the GenePattern interface to provide code-free analysis and visualization," the researchers wrote in Cell Systems.
Users can add multimedia as well as math formulas, tables, and web links and then share their work as "research narratives," according to the researchers. "The resulting notebooks can be shared, edited, executed, and published as complete encapsulations of in silico research."
Capabilities of GenePattern Notebook will grow over time. "We are looking to add the ability to combine notebooks into a platform" to support real-time collaboration as with Google Docs, Reich said.
The core GenePattern dates to 2004 within the Broad, and it has been on public servers since 2008. "Our applications have historically been in cancer," Reich said. But the methods in GenePattern are suitable for any genomic application, he noted.
The current GenePattern community has close to 50,000 registered users worldwide on its main servers and handles 2,000 to 5,000 analyses per week, according to Reich, and there have been more than 17,000 downloads of the software for local hosting.
GenePattern and GenePattern Notebook are supported by grants from the National Institute of General Medical Sciences and the National Cancer Institute.