By Monica Heger
Despite the falling cost of whole-genome sequencing, a recent project suggests that the analysis and annotation of the data may stall its implementation in the clinical setting.
In a study and accompanying editorial published last week in the Lancet, researchers from Stanford University and elsewhere described how they analyzed and annotated the genome of Stephen Quake and addressed the inherent challenges of interpreting the data.
Quake, a Stanford bioengineer and Helicos co-founder, sequenced his genome on the HeliScope last year and published the results in Nature Biotechnology (IS 8/18/2009).
In the current study, researchers performed a clinical assessment of the sequence data, attempting to characterize any predisposition to inherited disease, as well as how Quake would respond to certain drugs.
Despite the finding of clinically significant variants, the project took considerable time and resources — the paper had 31 authors and was published nearly eight months after the genome was sequenced — indicating that the analysis step still poses a major challenge in bringing whole-genome sequencing into clinical practice.
According to the authors, "present analytical methods are insufficient to make genetic data accessible in a clinical context, and the clinical usefulness of these data for individual patients has not been formally assessed."
Euan Ashley, the lead author of the paper and a cardiologist at Stanford, told In Sequence that "the genomic information itself will not be a hurdle anymore." Rather, the challenge will be "our ability to parse it and make sense of it."
"This was a very expensive undertaking with many co-authors spending hours on the analysis," Russ Altman, a senior author of the paper and professor of bioengineering, genetics, and medicine at Stanford, told In Sequence in an e-mail.
The team analyzed 2.6 million SNPs and 752 copy number variations, and determined that Quake has an increased genetic risk for myocardial infarction, type 2 diabetes, and some cancers. They found rare variants in three genes that have been associated with sudden cardiac death, and a variant in a gene that was consistent with a family history of coronary artery disease. Quake also had a mutation that suggests possible resistance the drug clopidogrel and several variants that indicate a positive response to lipid-lowering therapy, as well as variants suggesting he may have a low dose requirement for warfarin. They also reported many variants of unknown significance. A family history analysis also showed a history of vascular disease and one case of early sudden death.
The analysis focused on identifying variants associated with Mendelian disease, novel variants, variants known to affect drug response, and SNPs previously associated with complex disease. The researchers mined a number of different databases, including disease-specific mutation databases and the Pharmacogenomics Knowledge Base hosted at Stanford, which contains information on 2,500 variants, 650 of which affect drug response.
The researchers identified 63 clinically relevant previously described pharmacogenomic variants and six novel SNPs in genes important for drug response.
They found two rare variants in genes associated with coronary artery disease and osteoarthritis; five previously described rare variants in genes associated with rare diseases; two variants of unknown importance found in disease-associated genes, including one associated with familial hypertrophic cardiomyopathy and one associated with arrythmia; and four novel variants that could potentially be associated with rare diseases, including cystic fibrosis, parathyroid tumors, an iron disorder, and arrythmia.
They used algorithms to predict the effects of disease-associated variants, and whether specific variants were likely benign or deleterious. Of those variants, some were predicted to be benign, some damaging, and some at intermediate stages.
Aside from the sheer time and resources that must be invested to interpret whole-genome sequence data, another issue is interpreting the data correctly, said Madhuri Hegde, senior director of the genetics laboratory at Emory University.
Hegde, who was not involved with the Quake annotation paper, is developing algorithms for interpreting the function of novel sequence variants for clinical use.
"You need to be cautious when talking about this technology in the real world," she said, noting that algorithms that predict whether a variant will be benign or damaging are frequently incorrect.
Also, it is difficult to know whether an observed variant is a real change or not. "They found two to three novel changes that are associated with cardiomyopathy. But no one else in the family was tested … In putting together algorithms for how we can interpret variants, we need to study additional family members."
For example, she said her group has been working on a test for X-linked mental retardation that involves sequencing 92 genes associated with the disorder. "In our validation stage, we found variants that we were not able to interpret. And we only sequenced 92 genes," Hegde said. "A 'variants of unknown significance report' doesn't mean anything. It just creates more confusion" for the patient.
Hegde said that more sequence data, such as information from the 1,000 Genomes Project as well as population-specific genomes, will help in interpreting which changes are real or not.
Stéphane Bancel, the CEO of diagnostics company BioMérieux, agreed that interpretation of sequence data would be a major hurdle. He said there are three key factors that will be necessary to bring sequencing to the clinic: automated sample prep, the sequencing platform itself, and the data analysis. And the data analysis is "the piece that I'm worried about," he said. "I believe this is the biggest challenge."
In fact, Bancel said the complexity of the analysis was the main driver behind the recently formed partnership between BioMérieux and Knome, which provides sequencing interpretation services (IS 4/27/2010). The software will not only have to be accurate, said Bancel, but it will have to be simple to use in order to bring it to the clinical setting.
Another challenge in analyzing the data, said Stanford's Ashley, is in determining disease risk when a patient has multiple variants associated with the same disease, some of which enhance risk and some of which lower risk. "There isn't really a way of putting together SNPs that allows a reasonable estimate of risk," he said. In the study, the authors ranked SNPs by which ones had the greatest effect and attempted to give a likelihood risk ratio for the associated disease, but, noted Ashley, the combination of SNPs could produce a different effect than each SNP individually.
Bancel agreed: "There are still too many interactions between genes that we don’t understand yet."
Despite the challenges, Bancel said he remains optimistic. Other researchers thought that whole-genome sequence data would have its first clinical applications in determining drug response rather than disease risk.
"I think drug prescribing will take advantage of whole-genome sequencing within five years," said Stanford's Altman. "I think disease risk estimation and reporting is farther away for many reasons, and may be 10 years away from routine use."
He added that improved databases would help facilitate these applications. He said the PharmGKB database, a project for which he is the principal investigator, is attempting to "store all known relationships between human genetic variation and drug response."
Hegde added that creating more phenotype-driven gene panels will help find genes and variants that have clear associations with disease. However, she said, there are also genes that influence disease susceptibility but don't have a direct association, as well as many variants with no clinical significance at all, and variants and mutations that researchers have yet to discover.
"Before we say, sequencing costs have come down, [let's] sequence everyone, there has to be a realistic approach to doing this," she said.