We are seeking to recruit an enthusiastic developer to work as a Scientific Programmer on the eHive production system. You will join the Vertebrate Genomics Ensembl team at the European Bioinformatics Institute (EMBL-EBI), which is located on the Wellcome Trust Genome Campus near Cambridge in the UK.
Ensembl (http://www.ensembl.org/) is one of the most successful large-scale bioinformatics projects and one of the leading projects for genome annotation. Ensembl is importing massive amounts of data from various archives (genome sequences, variation archives, etc) and is also creating new resources (gene annotation, whole-genome alignments, gene trees, annotation of regulatory elements, etc). Most of those production workflows (worth 250 CPU years in 2014) are driven by the eHive system. eHive is a task manager that keeps track of all jobs to be performed within a pipeline, and schedules them in the appropriate order. It works above the job-scheduling system (Platform LSF, Grid Engine, etc).
We are looking for someone who will be involved in adding a Remote Procedure Call (RPC) system to eHive. The aim is to allow job execution on a remote compute cluster, via a predefined interface, whilst maintaining the concurrent abilities of eHive pipelines. This would allow us to expose some of the Ensembl tools as components that users can add to their own pipelines.
Your primary tasks will be to design and implement a RPC system in eHive in collaboration with the current eHive developer. In particular and in addition to the above, this role will involve:
Efficient control flow transfer between eHive pipelines: develop a method to efficiently track remote jobs;
Efficient data transfer between eHive pipelines: develop an interface for transferring parameters and data between pipelines;
Efficient Transfer of Events Back to the Calling Pipeline: develop a method for transferring events (for example job failure) back to the calling pipeline;
Outreach activities: engage with collaborators and potential users, for example by holding workshops.
The EBI is part of the European Molecular Biology Laboratory (EMBL) and it is a world-leading bioinformatics centre providing biological data to the scientific community with expertise in data storage, analysis and representation. EMBL-EBI provides freely available data from life science experiments, performs basic research in computational biology and offers an extensive user training programme, supporting researchers in academic and industry. We are part of EMBL, Europe’s flagship laboratory for the life sciences.