Data Scientist

Job Location
The Wellcome Genome Campus
CB10 1SD
United Kingdom
Grade 5 - Competitive Salary

EMBL is an inclusive, equal opportunity employer offering attractive conditions and benefits appropriate to an international research organisation. The remuneration package comprises a competitive salary, a comprehensive pension and health insurance, educational and other family related benefits where applicable, as well as financial support for relocation and installation.

We provide a dynamic, international working environment and have close ties with the University of Cambridge and the Wellcome Trust Sanger Institute.

EMBL-EBI staff also enjoy excellent sports facilities, a free shuttle bus to Cambridge and other nearby centres, an active sports and social club and an attractive working environment set in 55 acres of parkland.

Job Description

We are looking to recruit a Data Scientist to join the Literature Services Team to work on text analytics within the ELIXIR-EXCELERATE project based at the European Bioinformatics Institute (EMBL-EBI) located on the Wellcome Trust Genome Campus near Cambridge in the UK.

Your role will be to develop methods that support the incorporation of content from life sciences research articles into public data resources such as UniProt, the protein information resource. Furthermore, you will contribute to the development of text analytics that ascertain the use of public data resources by the international research community, as reported in the scientific literature. Working with database curators at the EMBL-EBI, our project partners located in Switzerland and Spain, as well as other team members, you will provide meaningful and robust solutions to these challenges. The Literature Services Team operates in an agile environment which develops Europe PMC ( and works on a variety of literature-related and text analytics projects.  The team is enjoys a collegial and supportive atmosphere to deliver top quality software for our users. ELIXIR is the distributed pan-European life sciences infrastructure for biological information.  You will collect, annotate, archive and make available to all critical life sciences data which will enable Europe to collectively tackle some of the most pressing societal challenges.


You will need expert knowledge in the area of text analytics/data mining and will have ideally gained this experience in a biomedical or related field.  Java programming skills, a working knowledge of JavaScript and JSON and a good knowledge on XML, XSLT and Xpath are also required.

As well as experience in natural language processing/text analytics, you ideally will have also worked with related technologies such as RDF/linked data and SOLR/Lucene, or with data analysis tools/languages such as R.  Experience from an industrial setting is desirable.  If you do not have expertise in all these areas, then this position would give you the opportunity to extend your repertoire - flexibility to take on new skills to solve problems is the mindset we are looking for.  You should also be willing to present the work of the group within the ELIXIR-EXCELERATE project and beyond.  Some international travel will therefore be required. This position would suit someone who is self-motivated and would like to work on a cutting edge project in a diverse, international environment.

About Our Organization

The EBI is a world-leading bioinformatics centre providing biological data to the scientific community, with expertise in data storage, analysis and representation. This biomolecular information is made available through extensive services accessed via its web pages ( Optimising the usefulness of these services to the user community is an ongoing and challenging task.

NIH's Michael Lauer looks at the number of grants, their amount, and funding success rates at the agency for last year.

At Nature, Johns Hopkins' Gundula Bosch describes her graduate program that aims to get doctoral students thinking about the big picture.

Patricia Fara writes about childcare funding, and women in science and science history at NPR.

National Institute of Environmental Health Sciences researchers have visualized the career paths of former postdocs.