Skip to main content
Premium Trial:

Request an Annual Quote

Baylor, Stanford Research Team to Handle Computational Components of NIH's Clinical Variant Collection, Curation Effort


The National Institutes of Health has awarded $8.4 million over four years to a research team from Baylor College of Medicine and Stanford University to handle the computational aspects of its Clinical Genome Resource, ClinGen.

The scientists will use the grant to develop and apply computational tools to predict which genomic variants are associated with disease risk and to prioritize them for further study, as well as to devise methods of processing variants more quickly than is currently possible. As part of these efforts, they'll work on improving methods of predicting disease risk-associated variants in non-white populations — an area where there is limited information on variant-disease associations making it more likely that genetic tests will yield "uncertain results," according to Sharon Plon, a professor of molecular and human genetics at Baylor and a co-principal investigator on the grant.

This is the third grant awarded by the NIH's National Human Genome Research Institute and the National Institute of Child Health and Human Development to fund the development of ClinGen, which is intended to provide a framework for evaluating the roles of genetic variants in disease and exploring methods of making this information a part of clinical care.

In addition to the Stanford and Baylor team, NIH has funded two other groups led by researchers at Brigham and Women's Hospital in Boston and the University of North Carolina, Chapel Hill. Respectively, these teams will develop standard formats for gathering and depositing clinical variant data in the National Center for Biotechnology Information's ClinVar (BI 10/04/2013), and on standards for the clinical relevance of variants (BI 10/04/2013).

"Our grant is really focused on the question of how do we decide which variant is associated with disease and how do we make sure we can do that effectively across different ethnicities and races," Plon explained to BioInform. "Our goal is ... to try and develop a computer resource that will bring in as much existing information as possible … from all of the population genetic sequencing that’s being done right now."

Within the first year of the Baylor-Stanford grant, Plon and her colleagues will focus on developing the core database for the project that will hold the variety of different data types that will be needed for variant classification such as that collected by co-PI Carlos Bustamante's lab, which has sequenced genomes from a variety of non-Caucasian populations, as well as data from public resources such as the NCBI's database of genotypes and phenotypes.

This will help the team, Plon said, access much more diverse population data than is normally available to clinical labs and explore questions such as how often a particular variant appears in a population and how that relates to ethnicity and ancestry. It is essentially "a working database," she explained. It allows "[us] to bring in data types that ClinVar may not be able to handle at the moment but which will be helpful for variant annotation," and to test drive tools developed by different annotation teams directly on the variant data.

The data will be available to teams of clinical, diagnostic, and bioinformatic experts being assembled at all three grant sites who are responsible for determining the best ways of defining the likelihood that observed genetic changes affect ailments such as cancer, cardiovascular disease, and metabolic disease. After they've been curated, the variants will be made available to the community though ClinVar.

Other early efforts under this grant include identifying and annotating relevant genes and variants that haven’t been well addressed by disease association studies. For example, the ENIGMA consortium focuses on classifying BRCA1 and BRCA2 variants, which are implicated in breast and ovarian cancer risk, but there are other breast cancer genes that aren't being studied in such detail so "maybe that’s an area where we can make an early impact," she said. There are also ongoing efforts to classify mutations involved in cystic fibrosis, but there are other metabolic disorders whose mutations aren't as well annotated so some of those could also be early targets for the team, she said.

"We don’t want to duplicate efforts, so one of the first things that the teams of experts are doing is saying which efforts out there are … extremely high-quality curation efforts and [asking] 'should we just take their analyses and make them easier for other investigators to see?'" she said.

In terms of tools for predicting disease risk, Plon said that the team will likely use different types of variant annotation applications, including well-known software like PolyPhen and SIFT, as well as disease-specific metrics rather than rely on a single informatic tool or algorithm. Furthermore, the team is evaluating existing methods that have been used by other groups annotating sequence variants such as likelihood ratios.

A longer term goal, she said, is to develop software tools such as machine learning algorithms that will automatically process variant data in order to eliminate some of the manual curation steps that are currently used.