NEW YORK (GenomeWeb) – Seven Bridges Genomics has provided an unrestricted gift of an undisclosed amount to the University of California, Davis to support the development of the Common Workflow Language (CWL), which will provide standardized specifications for describing analysis tools and workflows.
Specifically, the funds will support Michael Crusoe, lead software engineer in the laboratory of Titus Brown in UC Davis' Department of Population Health and Reproduction, on a full-time basis in the position of research software engineer on the CWL project. The funds are for an initial six-month period with additional support to follow.
"Portable workflows accelerate the pace of scientific discovery and allow for reproducibility in a way never before possible in bioinformatics," Brandi Davis Dusenbery, Seven Bridges' scientific program manager, said in email to GenomeWeb. "Supporting Michael and his work is an early step we're taking on this road."
The CWL is intended to make it easier to share computational and run pipelines on multiple platforms. The standard grew out of discussions amongst researchers about the difficulties of getting tools to run on large workflow systems which often have their own languages and formats for describing tools that don't translate across platforms. So far, the CWL working group has written two drafts of the language and started testing it. Contributions to the language have come from members of academia and industry including developers involved in the Galaxy project and companies like Seven Bridges and Curoverse.
Full details of the plans for the CWL are available here. Over the next six months, the CWL working group plans to offer several workshops, coding meetings, and other events that will, among other things, focus on improving the language and crafting user guides. The first of these is a workshop that will happen on Nov. 3 at the Festival of Genomics in San Mateo, California. They will also run a coding session the following day, hosted by Intel and aimed at improving current CWL implementations and specifications. The funds will also support a baseline analysis of the sustainability of the CWL, which Crusoe will perform using evaluation criteria provided by the Software Sustainability Institute, he said.
Other activities that the CWL working group has planned for the next six months include establishing a system of shared governance for the CWL project. They also plan to launch a non-profit foundation that will own assets on behalf of the community and will support member contributors using a mix of donated funds and funds from grant applications, among other sources.
Crusoe also said that the group will release version one of the CWL specification sometime in the next six months. This first release will include documentation that describes the standard, a model implementation for reference purposes, and protocols for defining extensions to the language moving forward — user implementations of the language will be tested regularly using a conformance suite. In addition, the working group will post several dashboards on the CWL website including one that lists all the tools and workflows that have been described using the CWL. The group also plans to publish a peer-reviewed manuscript that describes the CWL in the coming months, Crusoe said.
Ultimately, the developers hope that the specification is broadly adopted by the life sciences community and beyond. In their vision, software authors will ship CWL definitions along with their tools when they are installed using packages such as Debian, RedHat, Conda, and LinuxBrew. In cases where developers of useful software packages have moved on, the community would take on the responsibility of describing their tools using the specification. They also hope that the CWL becomes the preferred way for fulfiling journal or funder requirements for reproducible analysis workflows.
"[We're] not saying the CWL has to be the only [standard] that's accepted," Crusoe told GenomeWeb. "The idea here is that it would be really great if the community norms were that you had to describe your computational workflows at this level of detail. The vision for our work is that we would fulfill that requirement and that most people would choose to go this route."