Data Quality Engineer | GenomeWeb

Data Quality Engineer

Organization
University of Chicago, Center for Data Intensive Science
Job Location
Chicago, IL
Salary
Commensurate with experience
Job Description

We're looking for a problem solver with a background working in data integrity and testing to ensure high quality data and metadata is distributed to the cancer research community. Elevate your career with this opportunity to work with one of the world's largest collections of harmonized cancer genomic data. This role focuses on the Genomic Data Commons, which is at the forefront of both cutting edge research and production systems supporting cancer research. You will join a team of engineers developing innovative technologies who will keep you challenged in our dynamic environment as we work together to pursue discovery through data-driven cancer research.

You will join the team as the lead engineer for data quality and integrity. You will focus on leading data quality efforts related to data integration, higher level data products, and distribution to the cancer research community. To accomplish this, you will work across multiple teams to build and automate frameworks such as anomaly detection, reporting, and alerting to ensure data quality. You shall gain expertise not only in the data itself, but the systems as well in order to interrogate the data and understand gaps in data quality. Data and metadata quality has a broad scope therefore you are expected work collaboratively across teams to determine priorities and best methods for achieving objectives.

Key Responsibilities

Data Quality and Integrity - Drive the design of the data QA infrastructure and execution of testing protocols to validate pipelines, integrated datasets, and data products. Use a combination of exploratory, regression and automated testing to ensure data quality standards. Assess appropriate inclusion/exclusion of data based on defined data dictionary; assist in evaluation of data dictionaries and utilize data specification and code to validate data as it relates to quality.

Data Quality Improvement - Proactively identify potential data issues and downstream impact. Identify existing data issues and perform research and root cause analyses to determine resolution. Work collaboratively with software engineers and bioinformaticians to achieve and verify resolution. Establish processes and standards to improve data quality assurance and implement efficiencies in data management. Define measurements and metrics to conduct and present routine data reports to the project team and stakeholders.

Data Management - Participate in data acquisition and integration planning efforts including data modeling, data dictionary definitions, and data harmonization pipeline development. Develop a deep understanding of multiple genomic datasets and the technical data management software and processes of the underlying system. Define data quality and integrity criteria and develop a comprehensive data quality management plan to lead key data QC efforts through team collaboration for all phases of the data management life cycle.

Technical Writing - Contribute written knowledge and expertise to system documentation, user documentation, scientific manuscripts, reporting, grant proposals and reports, and presentation materials. Stay abreast of broad knowledge of existing and emerging technologies and QC tools in the cancer genomics space.

 

Requirements

Bachelor's degree in Computer Science, Bioinformatics, or relevant engineering or scientific field such as Physics or Genomics required.

5+ years of experience in progressive technical business analysis role required.

Experience with Agile methodology required.

Experience with writing technical specifications required, with a focus on full stack architecture, including REST APIs, SQL and noSQL data solutions and distributed infrastructure required.

Experience with business analysis and quality assurance professional standards, business processes, workflows, methodologies and leading practices required.

Experience leading business analysis activities while ensuring the traceability and optimum coverage of business requirements defined required.

Experience working in a Linux command line environment required.

Preferred

PhD in an relevant engineering or scientific field highly preferred.

Experience in Change Management, Release Management, Incident, Problem Management and working on Business Intelligence preferred.

Experience with HIPAA and/or FISMA security regulations preferred.

Experience with cancer or human genomics preferred.

Experience with bioinformatics preferred.

Experience managing a backlog of requirements in an Agile workflow preferred.

Experience creating user stories from requirements preferred.

Experience with JIRA project tracking software preferred.

 

How to Apply

Apply under Requisition#101319 at jobopportunities.uchicago.edu

About Our Organization

About the Genomic Data Commons The Genomic Data Commons (GDC) is a comprehensive computational facility to centralize and harmonize cancer genomic data generated from NCI-funded programs. The GDC is the foundation for a genomic precision medicine platform and will enable the development of a knowledge system for cancer. The GDC will provide an open-source, scalable, modern informatics framework that uses community standards to make raw and processed genomic data broadly accessible. This will enable previously infeasible collaborative efforts between scientists.

About the Center for Data Intensive Science The Center for Data Intensive Science at the University of Chicago is developing the emerging field of data science with a focus on applications to problems in biology, medicine, and health care. Our vision is a world in which researchers have ready access to the data and tools required to make discoveries that lead to deeper understanding and improved quality of life. We democratize access, speed discovery, create new knowledge and foster innovation through implementation using data at scale. Our scientific data clouds and commons include the Genomic Data Commons, Bionimbus Protected Data Cloud, and Open Science Data Cloud.

A trio of editors from the Nature family of journals describes what make a peer review a good one.

Spots in genetic counseling training programs are competitive, Maclean's reports.

Bitesize Bio offers some tips to make PubMed searches more efficient.

Regeneron Pharmaceuticals comes out on top of Science Careers' ranking of best biotech and pharma employers.