Job Description

The Vertebrate Annotation team are looking to recruit a highly motivated bioinformatician to join the Ensembl Genebuild project at the European Bioinformatics Institute (EMBL-EBI), located on the Wellcome Genome Campus, near Cambridge in the UK.

Ensembl Genebuild produces gene annotation for vertebrate species, including the reference gene sets for human and mouse. We are one of the world’s leading groups for identifying the location, structure and tissue-specific expression of genes and their transcripts for large numbers of vertebrate species.

Your main responsibilities will involve running our large-scale annotation system to produce gene annotation. As part of Ensembl Genebuild, you will:

  • Run production pipelines on a large compute cluster to download, process and integrate data from various public sequence archives
  • Produce high-quality, evidence-based gene sets for vertebrate species including protein-coding genes, noncoding RNA genes and pseudogenes
  • Produce tissue-specific RNA-seq alignments and transcript models
  • Collate these data into appropriate databases and files for the Ensembl release cycle
  • Document the data and processes used in gene annotation
  • Contribute to the Ensembl code-base including the design and implementation of new annotation pipelines and software
  • Work collaboratively with other members of Ensembl on data production around the Ensembl release cycles
  • Communicate the team’s work internally and to researchers in the field, including user support

Ensembl Genebuild is a component of the Vertebrate Annotation team. As part of Ensembl Genebuild, you will join seven bioinformaticians who are experts in gene annotation. We have domain area expertise in public archives, alignment methods, software development, large-scale compute, pipeline workflows and automation. Future projects for the team include: scaling up gene annotation pipelines, improving our annotation of noncoding genes, automating and enhancing our RNA-seq annotation pipeline, and developing methods of analysing new data including full-length transcriptomic reads.

This is a unique opportunity to contribute to Ensembl’s mission to analyse, store and disseminate genomic annotation. For further information about Ensembl, please visit: www.ensembl.org and https://github.com/Ensembl.

For further information about Ensembl Genebuild, please visit:


You should hold an MSc, PhD or equivalent experience in Computer Science, Bioinformatics, Genetics or related fields and be able to write, understand and maintain complex code. You will also have domain experience of eukaryotic genome annotation, the biology of gene expression, current methods for DNA sequencing and sequence alignment.

Previous experience of processing large biological data sets in a production environment would be advantageous, including an understanding of compute clusters, pipeline workflows, software design and automation. Evidence of working in a dynamic, team-based environment or contributing to a large, shared code-base is desirable.

Proficiency in object-oriented programming, experience in compute clusters and in developing software in a primarily Unix-based environment, and familiarity with development tools such as Git are essential. Experience in developing computationally efficient solutions, working on a large but continually-evolving codebase, and familiarity with relational databases would be an advantage.

You will have good communication and interpersonal skills, be a self-starter, and have the ability to manage your own time to meet the needs of several projects. The key attributes sought are the ability to work in a team, excellent attention to detail, solid problem solving skills, and the desire to learn and improve. Furthermore, you should demonstrate your ability to communicate both biological and computational ideas (orally and in writing), time management to deadlines, and a desire to work in an international environment.

How to Apply

To apply please submit a covering letter and CV, with two referees, through our online system.

About Our Organization

EMBL-EBI is part of the European Molecular Biology Laboratory (EMBL) and it is a world-leading bioinformatics centre providing biological data to the scientific community with expertise in data storage, analysis and representation. EMBL-EBI provides freely available data from life science experiments, performs basic research in computational biology and offers an extensive user training programme, supporting researchers in academic and industry. We have close ties with both the University of Cambridge and the Wellcome Trust Sanger Institute.

