The position will be responsible for conducting analyses involving clinical and phenotypic data for genetics studies as well as clinical development and commercial programs. Responsibilities will include curating, cleaning, and analyzing large-scale phenotypic datasets, including de-identified EMR extracts from external collaborators, targeted clinical datasets in selected cohorts, and internal datasets from clinical trials and other human subject research. The position will involve working within a team of database administrators, biostatisticians, clinical scientists, and programmers to structure and mine clinical and phenotypic data sets and support genomic studies and association analyses. The position will require coordination and collaboration with other scientists within the department, research and clinical scientists at Regeneron, and external collaborators.
Additional responsibilities include, but are not limited to:
- Work within a team of programmers, database administrators, statisticians, and clinical scientists, as well as external collaborators and facilitate EMR data extraction, transformation and processing from multiple health system partners.
- Conduct data analysis, including mining and curating of phenotypic datasets with primary responsibility in developing and identifying clinical phenotypes and cohorts of interest for “phenotype first” genomic analysis of associated samples and efficient data mining and association analysis in both phenotype first and genotype first queries.
- Conduct algorithm development, development of data models, natural language processing (NLP) and textual mining of “scrubbed” and de-identified healthcare provider notes.
- Implement GUIs and GUXs such as i2b2, tranSMART, or other software to enable a scalable data warehousing and informatics framework and data mining/querying by department team members and broader Regeneron scientists.
- Close collaboration and coordination with external health system collaborators and bioinformatics teams mining EMR and phenotypic data sets. Work with these collaborators to structure data and develop algorithms, rules engines, and querying tools to access and curate the phenotypic datasets.
- Develop analytic methodologies and approaches to address queries for cohort selection related to sequencing and epidemiological outcomes studies. Execute the analyses in a timely, accurate and reliable manner. Communicate findings clearly to diverse stakeholders and document work for training and replication purposes.
- Utilize multiple sophisticated Analytic Methodologies and Data Reporting/Management tools, and contribute to work presented to senior leadership and externally to collaborators.
- Implement and use Analytic Methodologies and Data Reporting/Management Tools (e.g. SQL Query Analyzer, Crystal Enterprise).
- Function as a "super user" of either priority Analytic Methodologies or Data Reporting/Management tools.