- The Data Scientist provides collaborative and innovative analytic support across all divisions of the center, leveraging big data to support and empower center investigators. - The Data Scientist will be responsible for construction of analytic datasets sourcing elements from potentially diverse data sources, implementation of exploratory and predictive analyses, visualization of data, and communication and presentation of results.
- Identify and integrate disparate data sources, both internal and external, including raw data from medical researchers, unstructured data from clinical experts, and well-established, publically-available databases
- Develop and deploy machine learning algorithms, predictive models, and classification methods to advance cancer research and inform clinical decision making
- Deliver novel, data-driven insights to improve outcomes in the treatment of cancer
- Identify areas of growth for the data science initiative and actively engage in enhancing the breadth and reach of data science across the Fred Hutch campus
- Collaborate with researchers and clinicians to identify high-impact opportunities for data science applications
- Manage data science projects from creation to completion
- Communicate results to technical and non-technical audiences
- Masters or PhD degree in Bioinformatics, Statistics, Biostatistics, Mathematics, Computer Science, Physics, or equivalent required, with a minimum of two years of related experience.
Core competency in at least one of the following: genomics, natural language, image processing, medical records or claims.
- Experience with messy, "real life" data sets.
QUALITIES NECESSARY FOR SUCCESS
- A strong desire to explore, investigate, dig, and generally uncover patterns and puzzles in data while maintaining a strong sense of thoughtful and pragmatic solutions.
- Ability to advise investigators and management in clear language about results and new directions; strong oral and written communication and critical thinking skills are a mustfor this position.
- Ability not only to work autonomously, but also to work collaboratively within multidisciplinary teams including statisticians, computational biologists, data engineers, epidemiologists, clinicians, administrators, etc.
- Proficiency in R or Python.
- Knowledge of statistical analysis, machine learning and predictive modeling.
- A variety of data formats and markup languages (e.g. XML, JSON, RMarkdown).
- Common data storage mediums (e.g. SQL, Excel, Access) as well as NoSQL models.
- Unix/Linux and distributed computing.
- Version control (e.g.., Git).
Big data platforms:
- Hadoop, Hive, and/or MapReduce
- Code version control (Git, Github) and containers (Docker)
- Proficiency in at least one common object oriented programming language (e.g. Java, C++, C#).
- Experience in application development, visualization, and user design.
To apply for this position, please CLICK HERE