The EMBL-EBI Sequence Families Team, led by Rob Finn, is responsible for a number of world-leading protein family annotation resources, including Pfam and InterPro. Pfam is a database of protein families, represented by multiple sequence alignments and profile hidden Markov models (HMMs). InterPro is a resource that classifies protein sequences into families, and predicts the presence of functionally important domains and sites. It does this by incorporating protein signatures from 13 different specialist international member databases, including Pfam.
There is an opportunity for a bioinformatician/software developer to join the team to work on the integration of the TreeGrafter software within the InterPro production pipeline, as well as into InterProScan, a tool which facilitates predictions of features on novel protein sequences. You will be working on the UniRules Project. Your role
- Evaluate different approaches to scaling the TreeGrafter software within InterProScan
- Develop a new component for TreeGrafter analysis within InterProScan
- Determine how best to propagate annotations generated within the InterPro pipeline to backend data stores
- Work with UniProtKB to enable the expansion of the UniRule system to take advantage of the tree-based annotations
- Support the InterPro team in the intergration of TreeGrafter-based annotations with internal annotation tools
- Develop training materials to support the execution and interpretation of TreeGrafter results