EMBL-EBI seeks an experienced 'cloud' Bioinformatician to contribute to the bioinformatic analysis of cancer genomes. Despite the widespread use of whole genome sequencing on cancer samples, the analysis of the data is still a challenging task. This is due in large part to the size of the data, which requires substantial capabilities in terms of storage and compute, especially when analysing large patient cohorts, which in addition are generally protected by privacy rules. Processing these large datasets requires robust software solutions that can be deployed in a wide variety of compute infrastructures.
The purpose of the project is to develop a suite of cancer genome analysis workflow on the EMBL-EBI's Embassy OpenStack cloud compute infrastructure, comprising more than 6000 vCPUs and 4.5PB of storage. You will run large scale analyses for our own research purposes and provide training for external collaborators in using these tools. You will set up these tools and run them on thousands of samples provided by collaborators. In particular, you will:
• Establish a reusable deployment and monitoring methodology in an OpenStack environment
• Curate and automate bioinformatic pipelines and other workflows for cancer genome analysis available at Dockstore
• Manage data access and download the data
• Run the pipelines, QC and curate results
• Link the results to other EBI resources
• Train other users in the use of the cloud-based pipelines
You will work within the Genome Analysis team (led by Daniel Zerbino) and collaborate closely with the Computational Cancer Biology research group (led by Moritz Gerstung), which will provide scientific expertise for algorithmic development, and the Systems Application group, which will provide technical expertise for OpenStack cloud computing. The Genome Analysis team is itself part of Ensembl (led by Paul Flicek).