The Therapeutic Target Discovery R&D group at Regeneron is seeking a Data Scientist to develop infrastructure and software and facilitate/conduct cleaning, curation, modeling and analysis of large scale data and metadata screen generated by our “Tier1” high-throughput gene knockout embryonic and adult mouse phenotypic screen, with the ultimate goal of revealing insights into human biology. The Tier1 process is a broad-based phenotyping screen that includes: lacZ gene expression mapping to determine patterns of reporter gene expression, morphological phenotyping to determine developmental implications of gene removal, hematology/serum chemistry to assess blood cell, metabolic and other activities, PIXI/microCT evaluation to identify any alterations in bone density or lean/fat tissue composition, tumor phenotyping to explore the effects of gene removal on tumor growth and development, immunophenotyping to identify role in development and response of the immune system, and next generation sequencing-based Transcriptome analysis to identify changes in the expression levels of all gene messages.
The successful candidate will work within the team of biologists, data analysts and other scientists who collectively compose the Tier1 program. The position will closely coordinate and collaborate with other scientists throughout discovery, including bioinformaticians conducting transcriptome analyses, programmers and scientists at the Regeneron Genetics Center conducting human genomics and phenotypic data analyses, and scientific database programmers and administrators building and maintaining custom systems of mice, genes, constructs, process and observational data.
Responsibilities include, but are not limited to:
• working with programmers, database administrators and scientists to clean and extract data from custom scientific databases and commercial LIMS systems
• facilitate and conduct data analysis, including mining and curating of phenotypic datasets with primary responsibility in developing infrastructure, data models software, and statistically sound algorithms to facilitate genotype:phenotype associations and gene:anatomy reporter expression annotations
• integration and comparison of Tier1 results to human genomic and phenotypic data provided by the Regeneron Genetics Center, and public mouse phenotypic and genomic data repositories including MGI, IMPC, Sanger, BioGPS, EMAP, Allen Brian Atlas, and others
• implement GUIs and GUXs or other software to enable a scalable data warehousing and informatics framework, quality control and data mining/querying by department team members and broader Regeneron scientists
• close collaboration and coordination with scientific database programmers and Molecular Profiling team members mining transcriptome data. Work with these collaborators to structure data and develop algorithms, rules engines, and querying tools to access and curate the phenotypic datasets.