Canada's Advanced Research and Innovation Network (CANARIE) has given a C$435,000 grant ($391,000) to a McGill University-led bioinformatics project that is putting together pipelines and applications for analyzing, processing, and visualizing genetic and genomics data.
The CANARIE grant is part of larger $4M investment from the organization in nine software projects in a variety of disciplines that were selected from responses to a call for proposals it issued in June last year. The McGill researchers have also received a C$200,000 grant from Genome Quebec. They'll use funding from both grants to continue developing the Genetics and Genomics Analysis platform (GenAP), a free resource that provides pipelines that are easy to access and use, as well as high-performance compute resources that will reduce analysis bottlenecks and help life science researchers explore and make sense of their data.
Those tools and pipelines were initially developed and offered as part of sequencing services offered by the McGill University and Genome Quebec Innovation Center, one of six centers funded by Genome Canada, a national non-profit that supports large-scale research projects in genomics and protoemics, Guillaume Bourque, an associate professor in McGill's human genetics department and head of bioinformatics at the local Innovation center, told BioInform. He is one of the project leaders of the GenAP platform.
The system provides pipelines for things like Chip- and RNA-sequence analysis as well as whole-genome and whole-exome sequence analysis. The pipelines are made up of commonly used open source tools for sequence alignment and variant calling such as the Burrows-Wheeler Aligner, the Genome Analysis Toolkit. These are offered via an open source framework that supports easy installation and deployment of these tools. The system also includes virtual machines (VMs) that provide web-based services such as the UCSC Browser and Galaxy. It uses the Cern Virtual Machine File System to distribute the software services and pipelines.
Initially, Bourque's team used these tools internally, running them on a local cluster and returning results to researchers who'd requested the center's services. They will continue to do so but they are also hoping to make the tools more broadly available believing that others in the community can benefit from them, he said. To do so, they've moved their tools to computing resources that are provided by Compute Canada, a national agency that integrates high-performance computing resources at six partner consortia across Canada. The pipelines are available now and there is information on the GenAP website about the source code and how to install and run the software.
The current grants will support continued development of those pipelines but mainly it will support development efforts around the VMs — specifically the UCSC Browser and Galaxy — which focus on making them more user-friendly and simpler to deploy and run, Bourque told BioInform. For now, these particular applications have not been moved to the Compute Canada resources. According to the GenAP website, the McGill team is working in collaboration with researchers from the Université de Sherbrooke to create a GenAP Host, a server on the Compute Canada infrastructure that will serve as home for the VMs as well as on a portal through which users will be able access both the VMs and pipelines.
GenAP is available for use for free but up to a point. That’s because GenAP is currently hosted on Compute Canada resources that have been allocated to Bourque's account, he explained, so there is a limit to how much data they analyze at a time and what sorts of analyses they can perform. To run large jobs that require large amounts of compute, users will have to set up their own separate Compute Canada accounts — if they don't have one already.
As the project grows, the GenAP team plans to put in a request for a separate allocation of compute resources that will be devoted to the platform but that will likely not happen until next year, Bourque said.