NEW YORK (GenomeWeb) – A team led by researchers from the Scripps Research Institute recently published a paper in Bioinformatics that describes Omics Pipe, a freely available computational framework that provides access to automated analysis pipelines for exploring and analyzing various kinds of omics data.
According to the paper, Omics Pipe is "a Python package that creates a framework for assembling scripts into an automated, version-controlled, parallelized pipeline for bioinformatics analysis." It uses Ruffus to run pipeline steps, Sumatra for version control and run tracking, and DRMAA for distributed computing. It's currently available as a machine image on Amazon's cloud with all the requisite software dependencies and databases. It's also available as a standalone package that can be installed and run on local clusters based on SGE or PBS system schedulers, as long as the relevant third-party tools are installed.
Kathleen Fisch, a computational biologist in the University of California, San Diego and one of the lead authors on the paper, told GenomeWeb that she and her colleagues developed the platform to reproducibly organize and share analysis scripts that they had developed with other researchers. They also wanted to create a system that was both extensible and modular, and would let members of the community contribute their own scripts and code, she said. Fisch began working on the system as a postdoc at Scripps Research and continues to work on it along with her collaborators there.
Omics Pipe's developers distinguish their system from similar platforms such as Galaxy, which is set up to make computational analysis capabilities easily accessible to biologists; and more complex analysis platform like Bcbio-nextgen, which requires more expert knowledge and familiarity with command line programming to implement, customize, or extend as needed.
Omics Pipe sits between both sorts of solutions, according to its developers, catering to more advanced users — specifically computational biologists and bioinformaticians — "who may have a need for a tool that supports programmatic access to individual tools" that is at the same time "easily extensible and is reproducible," the researchers wrote. It still requires some computational and scripting knowledge but researchers with basic unix command-line experience should be able to handle it, according to the paper.
The system currently provides six best practice published pipelines for analyzing data including two RNA-sequencing pipelines, pipelines for calling variants from whole exome and whole genome sequencing based on the Genome Analysis Toolkit, and two ChIP-seq pipelines. It also offers "custom RNA-seq pipelines for personalized cancer genomic medicine reporting," according to the researchers.
To analyze their data, users can run a predefined set of pipelines or their own custom solutions, which can be created from modules built into the system. They can also specify the parameters for running the pipeline, "including the command line options for each tool and other customizable settings, through a parameter file in YAML format." The researchers also provide detailed tutorials, documentation, and source code so that potential contributors to the platform can add scripts in the form of Python modules.
Moving forward, Omics Pipe's developers plan to add to the list of available pipelines in their system. They also hope members of the broader computational community will contribute their scripts to the system and are working on a mechanism that will make that process much easier. The idea there is that instead of having researchers who want to add their scripts to the system make pull requests or branch or fork the repository, there would be some mechanism by which they could simply drop off their custom pipelines for incorporation into the larger platform, Fisch said.