NEW YORK (GenomeWeb) − A group of mostly US-based genomics and clinical sequencing experts has put forward a set of guidelines for determining whether a rare sequence variant is causally linked to a disease.
The guidelines, which they plan to follow up with more specific ones in the future, were published in Nature last week. They resulted from a workshop held by the National Human Genome Research Institute in September 2012.
The problem of sequence variants being incorrectly linked to Mendelian diseases is not new, but has grown in importance with the rise of exome and genome sequencing projects to find rare disease-causing variants in thousands of individuals afflicted with genetic disorders.
"We know that every single person carries many rare variants that potentially have some functional impact," said Daniel MacArthur, an assistant professor at Massachusetts General Hospital and the Broad Institute, who helped craft the guidelines. "You can always find a variant somewhere in the genome that looks as though it might be plausibly implicated with disease X if you look hard enough."
Already, the literature appears to be riddled with mistakes: a 2011 study by Stephen Kingsmore's group, for example, found that about a quarter of mutations associated with disease in about 100 sequenced individuals turned out to be common polymorphisms or misannotated, "underscoring the need for better mutation databases," according to the authors.
"That existing legacy problem from the current literature, combined with the possibility of having hundreds of thousands of rare disease patients to be sequenced over the next couple of years, [was the reason] we thought it was timely to think about how we might establish clear guidelines for determining whether or not a variant was causal," MacArthur said.
At least in a research setting, no formal recommendations of this nature currently exist, he said, but they are urgently needed because results from research projects often find their way into the clinic, informing clinical diagnoses that can have wide-ranging effects on patient care.
"The barrier between research and clinical sequencing is fairly porous, and probably becoming more so," MacArthur said. "It means that we have to be fairly rigorous about the way that we report these variants, both in the research setting and also in the diagnostic setting."
In their guidelines, the researchers focused on study design, how to implicate a gene in disease, how to implicate a sequence variant in disease, and publishing and reporting results.
In terms of the study design, they pointed out that different technological and analytical approaches might be most appropriate for different types of genetic disorders. They also urged investigators to conduct formal power calculations before a new study that take into account known information about the disorder and the available patient cohort.
With regard to implicating a gene in disease, researchers should first look at genes previously linked to a similar phenotype before considering new genes. They should report new genes only when they have been implicated in several unrelated individuals, and apply statistical methods to compare the gene and its mutations in patients and control groups.
The need for statistical evidence to associate a new gene with disease has been somewhat contentious, MacArthur said, with some researchers arguing that with rare diseases, sample size would be too small. But "even with a very small number of families, if you have a single gene that gets hit by multiple mutations, with a properly calibrated statistical model, you can actually very easily reach formal statistical significance," he said.
As an example of the possible pitfalls of linking a gene to disease, the authors pointed to an autism study involving almost 1,000 families that found four independent mutations in the TTN gene. However, that gene is one of the largest in the human genome, and two mutations would have been expected just by chance.
For assessing candidate variants, they recommended considering statistical evidence, and the frequency of the variant in matched control populations. "One of the clear messages that emerged from the discussion was the need for much more rigorous statistical genetic approaches to assessing these variants," MacArthur said. With data from large-scale sequencing projects, "there is an amazing opportunity now to leverage that to build formal statistical models that allow us to say, 'given the variants that we observe in a Mendelian disease patient, how likely are they to have occurred by chance?' That really should be the goal of the field I think."
Researchers should also not assume that a variant predicted to be damaging is actually disease-causing, and they should conduct experiments in cell lines or animal models to show their functional impact. All healthy individuals, for example, carry several protein-disrupting variants, with no ill effects.
But even experimental functional data needs to be treated with caution, MacArthur said, and be evaluated statistically. A good example, he said, is a mutation in a patient who is short, where disrupting the same gene in a mouse results in a small mouse. While that might look like good evidence for a causal relationship between the gene and short stature at first, one needs to consider that about a third of knockout mice have some reduction in body weight, he said.
Variants outside of protein-coding regions are "particularly challenging" to interpret, the authors noted, though genomic regions with a role in gene regulation have begun to emerge. For such variants, it is especially important to produce experimental evidence of their effect.
Also, variants may not be fully penetrant, they cautioned, and may not completely explain a disease. "For most mutations, we don't really know the penetrance," MacArthur said, which has important implications for genetic counseling. "The only way to get an unbiased estimate of penetrance is by taking a very large-scale collaborative approach, which will involve sequence data from hundreds of thousands of individuals and finding these variants within these populations, and then systematically phenotyping those individuals to see whether or not they actually carry that particular disease," he said.
Results should be reported with all available evidence for pathogenicity, and genotype and phenotype data for both patients and controls should be deposited in publicly accessible databases, they wrote. A model for sharing genomic data linked to rare diseases already exists for copy number variations, with the DECIPHER database, they noted.
For disease-causing sequence variants, the emerging "gold-standard" appears to be the National Center for Biotechnology Information's ClinVar database, MacArthur said. "We've been waiting for one particular database to start to gain enough momentum to become the [leader] where everyone starts to deposit their data and work within that framework. And my impression at the moment is that ClinVar has now become that database."
However, convincing laboratories, in particular diagnostic ones, to contribute their data will be "a tough sell," he said, not so much because this would make them less competitive but because it takes some time and effort to submit the data. But labs could also gain from data sharing, for example, by being able to more easily find unrelated patients with mutations in the same gene.
When reporting results to clinicians and patients, uncertain findings should be clearly marked as such, the authors wrote, and "it is critical that healthcare providers be made aware of the varying levels of certainty in the evidence for implicating a variant in disease."
Further improvement
While the new guidelines provide a great starting point, some think there is still room for improvement.
According to Joris Veltman, a professor of translational genomics at the Radboud University Medical Center in Nijmegen in the Netherlands, the paper is a "must-read for every biomedical scientist or clinician trying to interpret medical genetic data."
However, he said he missed a discussion of the importance of detailed and systematic clinical phenotyping. "This in my view should go hand in hand with statistical evaluations of variant data, and I am afraid that this is too often overlooked by those who do not have access to patient data and the possibility to recontact patients for clinical follow-up studies," he told Clinical Sequencing News.
Without phenotyping, he said, researchers might not follow up on interesting variants that don't reach statistical significance, resulting in an "enormous loss of valuable information, especially in clinical and genetically heterogeneous disorders such as autism and intellectual disability."
For example, de novo mutations in the same novel gene might occur in a cohort of patients with intellectual disability, but not reach statistical significance. "Detailed and repeated phenotyping of these patients may provide additional and compelling evidence of causality, for example, by identifying overlap in phenotypic features that are absent from the rest of the disease cohort," he said.
Also, a new mutation in a known disease gene might occur in a single patient of a cohort, and "even if not significant in this cohort, it is still enormously valuable to compare the phenotype observed in the patient with those reported in literature." Efforts to improve systematic clinical phenotyping are ongoing, he said, noting such work by researchers at the Charité university research hospital in Berlin.
In addition to databases containing disease variants, the quality of large-scale population variant databases needs to be improved as well, including the addition of at least some phenotypic information, according to Veltman. "These databases are known to harbor a significant amount of sequence and alignment errors that may lead to false conclusions, for example, when artifacts result in stop mutations in known or novel disease genes," he said. "This is a serious issue that needs to be addressed and improved, especially as more and more data is generated and less and less validation of these data takes place."
At least some diagnostic laboratories are already following the new guidelines and are eager to contribute their data to ClinVar. "When I read this, I thought, 'it's exactly what we already do,'" said Saskia Biskup, CEO of CeGaT in Tübingen, Germany.
CeGaT plans to submit its data to ClinVar by the end of this year, she said, after working out some practical issues of how to go about this. Having a single common database of disease variants will be a plus for diagnostic labs like her own, she said, which often have to combine information from several databases at the moment. "I'm convinced that the larger the database, the better the diagnosis we can make for a patient."
Going forward, the guideline's authors plan to make their requirements more specific. "The guidelines that we ended up putting out are not 'hard guidelines' but 'soft guidelines,'" MacArthur said. "This is really just a starting point towards stronger recommendations" that state, for example, statistical thresholds that should be used. One venue for formulating such requirements will be a workshop at Cold Spring Harbor Laboratory's Banbury Center this September, he said.