We're looking for a problem solver with a background working in data integrity and testing to ensure high quality data and metadata is distributed to the cancer research community. Elevate your career with this opportunity to work with one of the world's largest collections of harmonized cancer genomic data. This role focuses on the Genomic Data Commons, which is at the forefront of both cutting edge research and production systems supporting cancer research. You will join a team of engineers developing innovative technologies who will keep you challenged in our dynamic environment as we work together to pursue discovery through data-driven cancer research.
You will join the team as the lead engineer for data quality and integrity. You will focus on leading data quality efforts related to data integration, higher level data products, and distribution to the cancer research community. To accomplish this, you will work across multiple teams to build and automate frameworks such as anomaly detection, reporting, and alerting to ensure data quality. You shall gain expertise not only in the data itself, but the systems as well in order to interrogate the data and understand gaps in data quality. Data and metadata quality has a broad scope therefore you are expected work collaboratively across teams to determine priorities and best methods for achieving objectives.
Data Quality and Integrity - Drive the design of the data QA infrastructure and execution of testing protocols to validate pipelines, integrated datasets, and data products. Use a combination of exploratory, regression and automated testing to ensure data quality standards. Assess appropriate inclusion/exclusion of data based on defined data dictionary; assist in evaluation of data dictionaries and utilize data specification and code to validate data as it relates to quality.
Data Quality Improvement - Proactively identify potential data issues and downstream impact. Identify existing data issues and perform research and root cause analyses to determine resolution. Work collaboratively with software engineers and bioinformaticians to achieve and verify resolution. Establish processes and standards to improve data quality assurance and implement efficiencies in data management. Define measurements and metrics to conduct and present routine data reports to the project team and stakeholders.
Data Management - Participate in data acquisition and integration planning efforts including data modeling, data dictionary definitions, and data harmonization pipeline development. Develop a deep understanding of multiple genomic datasets and the technical data management software and processes of the underlying system. Define data quality and integrity criteria and develop a comprehensive data quality management plan to lead key data QC efforts through team collaboration for all phases of the data management life cycle.
Technical Writing - Contribute written knowledge and expertise to system documentation, user documentation, scientific manuscripts, reporting, grant proposals and reports, and presentation materials. Stay abreast of broad knowledge of existing and emerging technologies and QC tools in the cancer genomics space.