Data Scientist

Job Location
Wellcome Gemone Campus
CB10 1SD
United Kingdom
5 or 6 (monthly salary starting at £2,507 or £2,805 after tax)

EMBL is an inclusive, equal opportunity employer offering attractive conditions and benefits appropriate to an international research organisation. The remuneration package comprises a competitive salary, a comprehensive pension scheme and health insurance, educational and other family related benefits where applicable, as well as financial support for relocation and installation.

We have an informal culture, international working environment and excellent professional development opportunities but one of the really amazing things about us is the concentration of technical and scientific expertise – something you probably won’t find anywhere else.

If you’ve ever visited the campus you’ll have experienced first-hand our friendly, collegial and supportive atmosphere, set in the beautiful Cambridgeshire countryside. Our staff also enjoy excellent sports facilities including a gym, a free shuttle bus, an on-site nursery, cafés and restaurant and a library.  For more information about pay and benefits click here

Job Description

Are you a Data Scientist or Statistician who wants to data mine phenotypic big data? Do you have the skills to research, design, and develop analyses for data genetic contributors to aging and cancer? If so, this position is for you!

EMBL-EBI Mouse Informatics is building off recent successes (see Dickenson et al Nature, 22 September 2016) and has secured funding from the National Cancer Institute and the NIH Common Fund to expand our team to analyse the rich phenotype data we are collecting. This unique opportunity will provide the candidate unfettered access to unique datasets with direct relevance to human health while taking part in global collaborations and publishing in top-tier journals.

We are searching for a highly skilled data scientist or statistician to support two projects- the International Mouse Phenotyping Consortium (IMPC) and the PDX-Integrator. The IMPC is a G7 recognised global research infrastructure that coordinates the production and phenotyping of thousands of new mutant mouse strains with all data archived within our team and made available at Over the last five years, we have made 20,000 new gene-phenotype associations from 26 million data points collected from a diverse set of standardised phenotype tests. The IMPC is entering a new 5-year phase where mouse strains will have their physiological characteristics assessed after being aged. The newly funded PDX Integrator that will bring together genomic, histopathological, and drug response data from Patient Derived Xenograft (PDX) models. PDX models are mouse strains engineered to propagate human cancers and are increasingly being used in clinical research to test new chemotherapeutic regimes and study drug resistance mechanisms. The PDX Integrator will be the first resource to integrate PDX-related data from multiple sources and will leverage EMBL-EBI’s resources that store genomic, epigenomic and transcriptomic data.

For both projects, you will have the exciting opportunity to design and develop new analyses to explore one of the fundamental problems of biology - how do our genes contribute to aging and cancer? The ideal candidate will have experience with R or SAS to maintain and extend our current PhenStat production analysis while designing and developing new machine learning techniques that integrates our new phenotype data with the Biological Big data stored at EMBL-EBI. The candidate will form global collaborations with peers in dedicated data analysis groups and will be part of growing team that is contributing to the state-of-the-art for phenomics. The candidate will also be expected to present their work at international meetings and publish in peer-reviewed journals. While we anticipate this post being a full-time position, part-time hours would be considered for the right candidate.

Willingness to undertake international travel and availability for US based teleconferences is essential for this post. Excellent interpersonal, communication and English skills are also essential as the role will involve liaising with other scientists at EMBL-EBI and from around the world.

You’ll be working within Mouse Informatics at EMBL-EBI alongside developers, bioinformaticians and ontologists that make up the wider SPOT team. As part of your day to day job, you’ll be collaborating with the team, who have a range of expertise in semantics, data analytics, image analysis and 3D image display. You’ll also be interacting with other groups at EMBL-EBI and external collaborators, both within the UK and internationally, to improve our resources. 

Candidates are actively encouraged to apply for more than one position within the Mouse Informatics if qualified.


The ideal candidate will have 3-year experience in developing statistical methods for production database and hold a postgraduate degree in Statistics, Bioinformatics, or Data Scientist, although candidates with equivalent experience will also be considered.


  • Expertise with R or SAS
  • Design and delivery machine learning algorithms and systems
  • Experience in one or more of the commonly used parallel/distributed systems/technologies (e.g. Apach Spark)

  • Knowledge of relational and graphical database management
  • Coordinate development activities with outside collaborators
  • Ensure documentation is up-to-date
How to Apply

To apply please submit a covering letter and CV, with two referees, through our online system.

About Our Organization

EMBL-EBI is part of the European Molecular Biology Laboratory (EMBL) and it is a world-leading bioinformatics centre providing biological data to the scientific community with expertise in data storage, analysis and representation. EMBL-EBI provides freely available data from life science experiments, performs basic research in computational biology and offers an extensive user training programme, supporting researchers in academic and industry. We have close ties with both the University of Cambridge and the Wellcome Trust Sanger Institute.

NIH's Michael Lauer looks at the number of grants, their amount, and funding success rates at the agency for last year.

At Nature, Johns Hopkins' Gundula Bosch describes her graduate program that aims to get doctoral students thinking about the big picture.

Patricia Fara writes about childcare funding, and women in science and science history at NPR.

National Institute of Environmental Health Sciences researchers have visualized the career paths of former postdocs.