By Julia Karow
An improved version of the human reference genome that better represents the major alleles and includes ethnic-specific genetic variation will be crucial for the correct clinical interpretation of human genome sequence data, according to researchers at Stanford University who recently used such an "augmented reference genome" to interpret the genomes of former Solexa CEO John West, his wife, and his two children.
Based on this work, the Stanford team also found that sequencing families rather than individuals helps to weed out sequencing errors and to obtain haplotype-phased information, improving genetic risk predictions.
In a paper published last week in PLoS Genetics, the scientists, led by Euan Ashley, director of the Stanford Center for Inherited Cardiovascular Disease, used the augmented reference genome and improved analysis methods to interpret the genomes of the West family, enabling them to track down the genetic causes of thrombophilia that runs in the family. The Wests had their genomes sequenced in 2010 through Illumina's individual genome sequencing services (IS 4/20/2010).
The problem with the current reference sequence maintained by the National Center for Biotechnology Information, which has been used to call variants in virtually all next-gen human resequencing projects, is that it derives from a handful of anonymous donors of European origin from Buffalo, NY. "It doesn't really capture the whole gamut of genetic variation that's present in the human population, and it certainly doesn't capture any ethnic-specific human genetic variation," said Frederick Dewey, a fellow in the division of cardiovascular medicine at Stanford and the first author of the paper.
Also, the donors, like everyone, carried some disease risk variants, and if a newly sequenced genome happened to be identical at those positions, they would not be called as variants.
To make better references for three different HapMap populations — European, African, and East Asian — the researchers added to the reference genome data from the 1000 Genomes Project. They replaced the minor allele with the major allele at every position where it differed from the reference sequence — about 1.6 million in total for each population — to generate "synthetic major allele reference sequences."
Importantly, they found that more than 4,000 variant loci contained in a database of genotype-disease associations were represented by the minor allele in the NCBI reference. Using the synthetic major allele reference to analyze the family's genomes improved the accuracy with which variants in medically important loci were called, said Dewey, and helped them identify several risk variants for inherited thrombophilia in the family.
As more population variation data becomes available, they plan to create additional synthetic major allele reference sequences for other ethnicities. Already, they have incorporated the existing ones into their variant identification pipeline, which they are using for both whole-genome and for targeted sequencing projects to analyze families with inherited cardiomyopathies. "Its main use will be in variant identification, initially for rare and private mutations, in which it is important to have these variants show up in the file appropriately," said Dewey.
Helping Prevent Inherited Disease
One of the reasons the West family was interested in having their genomes sequenced is that John West has a history of pulmonary embolisms, and once developed a blood clot even though he was on anticoagulation medicine. The analysis showed that he has two variants that put him at risk for blood clot formation: one in the gene F5, which encodes the coagulation factor V, and one in the MTHFR gene, which encodes methylenetetrahydrofolate. He passed both of these on to his daughter, Anne, but not to his son.
[ pagebreak ]
Using the NCBI reference genome, the researchers would not have detected the F5 variant, because the reference carries the same variant, which occurs in 3 to 5 percent of the Caucasian population, Dewey explained.
The sequencing study revealed that Anne and her brother inherited another variant known to be associated with inherited thrombophilia from their mother, in the HABP2 gene, which encodes hyaluronan binding protein 2.
The results "inform Anne's medication choice in terms of mitigating any risk of developing blood clots and pulmonary embolisms," Dewey said, adding that she has changed some medications she took previously that could potentially put her at risk for blood clot formation.
Their father's medical history already indicated that the West children were at increased risk of developing thrombophilia, but their genome information provided greater precision.
"The reason that having the genomic information available is so transformative is, we can now say which of those risks are actually applicable to you," Ashley said. "Although these things generally run in the family, which ones did you actually get?"
Information like this could possibly help children avoid developing preventable diseases. "These are young kids, they are generally not taking medications, but … they have 40 or 50 years ahead of them during which they can be armed with this information to avoid a problem," Ashley said.
Using family sequence information, rather than the genome of a single individual, and tracing how variants were inherited from one generation to the next, helped the researchers identify those variants that were likely sequencing errors, enabling them to reduce the error rate by more than 90 percent.
In addition, it allowed them to assess genome-wide compound heterozygosity, and how it relates to medical risk. Scientists from the University of Washington had previously identified long-range haplotypes from family data, Dewey said, "but we were able to apply that to our variant annotation pipeline for predicting medical risks for the family, as well as response to drug therapy."
The researchers used long-range phased haplotypes to provide HLA types for each family member and calculated their risk for 28 common diseases. Based on that, all four are at high risk for psoriasis, and both parents but not their children are predisposed to obesity.
Also, both mother and daughter carry an ultra-rapid metabolism allele for the CYP2C19 gene, which encodes a key metabolizer of the antiplatelet drug clopidogrel, putting them at a higher bleeding risk when using the drug.
The Stanford researchers are now applying genome sequencing and their analysis methods to clinical research. At the Stanford Center for Inherited Cardiovascular Disease, for example, they use exome sequencing and whole-genome sequencing to provide molecular diagnoses to families with hypertrophic cardiomyopathy in cases where a standard genetic test that screens 17 genes comes back negative. "It's not routine, but it's happening more and more," Ashley said.
They are also in the midst of a pilot phase for a clinical trial that will test whether genetic information about cardiovascular disease risk will make patients change their lifestyle or improve test results such as cholesterol levels. While the pilot phase is using genotyping chips, the goal is to move to whole-genome sequencing for the full-scale study, Ashley said.
In addition, the Stanford team is collaborating with the University of Florida to put genetic information in electronic medical records. Doctors already use information about drug-drug interactions when they prescribe medication, he said, "but I think it's not too much of a stretch to think that in the not-too-distant future, there will also be a drug-genome table, where you put in your genome, [and] the computer looks up a drug-genome interaction and says, 'That drug for this particular patient is not the best one.'"
Stanford's "workhorse system" for sequencing is currently Illumina's HiSeq, he said, though he and his team also work with Complete Genomics.
One of the challenges is to keep up with the increased demand for clinical genome sequencing in an academic setting, prompting several Stanford professors, including Ashley, Russ Altman, Atul Butte, and Mike Snyder, to form a company, Personalis, that is using some of the methods described in the West study to analyze sequencing data. John West, another co-founder, is the CEO of the firm.
Have topics you'd like to see covered in Clinical Sequencing News? Contact the editor at jkarow [at] genomeweb [.] com.