By Julia Karow
Since its inception five years ago, the National Human Genome Research Institute's ClinSeq study has sequenced hundreds of exomes and a few whole genomes of individuals and has returned results that might help participants maintain or improve their health.
Along the way, the pilot study, which is testing the feasibility of using large-scale medical sequencing to find and return genetic variants of clinical significance to individual participants, has dealt with a number of challenges, including analyzing and interpreting large amounts of sequence data in the absence of a specific clinical phenotype, misannotations in databases of genetic diseases, returning unexpected results, dealing with variants of uncertain significance, and the time and cost of educating study subjects about their results.
Earlier this month during a webcast seminar sponsored by Illumina, Les Biesecker, ClinSeq's principal investigator and chief of the genetic disease research branch at NHGRI, and Flavia Malheiro Facio, lead associate investigator for ClinSeq and genetic counselor at NHGRI, spoke about some of these challenges, as well as what motivated some of the early participants to sign up.
As of March, ClinSeq had enrolled more than 900 patients, which includes baseline phenotyping, and had completed exome sequencing for more than 500, generating 12 gigabases of coding sequence in the process. As a pilot project, the study has also sequenced two whole genomes, amounting to 6 gigabases of haploid sequence.
Though most participants are labeled "healthy volunteers," at least a subset of them have enrolled because of a personal or a family history of disease, according to Facio. In a survey of the first 322 participants, roughly half said they signed up in order to help research, and the other half because they are seeking information about their personal health.
The decision to start with exome sequencing was mainly driven by economic reasons, Biesecker explained, and the researchers are fully aware that disease-causing variants may reside in non-coding portions of the genome. Generating whole genomes currently costs about six times as much as exome sequencing, and "given that we have a lot to learn about exomes, we would rather go broad on the patients and interrogate exomes to start with, until the costs come down further, and then we can switch eventually to whole genomes," he said.
At the beginning of the study, the researchers decided to look at just one phenotype — atherosclerosis — in order to "get the patients in the door," Biesecker explained, but they have since expanded ClinSeq to other projects, some driven by phenotype, others by genotype.
All participants consented when they enrolled to have their entire genomes sequenced, which included a one-hour session with a genetic counselor. The study favored enrollment of individuals who are generally open to receiving genetic results.
Sequencing data is initially generated in a research environment. Each time the ClinSeq researchers find a potentially clinically significant result, they call the participant and describe the result in a general way in a phone call that lasts up to an hour. They note, for example, whether it could affect the participant directly or only his or her descendants, or talk about the general nature of the disorder the variant is linked to.
Only when the proband agrees to receive the result is it validated by Sanger sequencing in a CLIA-certified lab. It is then returned in person in a session with a genetic counselor that lasts about two hours. As with other medical tests, the lag between the initial phone call and obtaining the specific result can be distressing to participants, Biesecker noted, but ClinSeq is considering ways to minimize this anxiety.
[ pagebreak ]
The study has already provided patients with insights they would not have gained otherwise. For example, the ClinSeq researchers decided to search the exomes of approximately 250 participants for variants in 37 known cancer susceptibility genes. The results would be useful to the study subjects, they reckoned, because those with a mutation in any of those genes could be monitored closely or take other measures to reduce their risk of developing cancer.
Overall, they found more than 14,000 non-synonymous variants in the participants' exomes, and about 250 in the 37 cancer syndrome genes. After filtering these for frequency and reviewing databases and the literature manually, they were left with 155 variants, several of which they deemed clinically significant.
For example, in one male participant, they found a frameshift mutation in the BRCA2 gene that had been described multiple times in the literature and is associated with a high risk of breast and ovarian cancer. Males with this mutation have not only an elevated risk of breast cancer, but also of prostate cancer and ocular melanoma. Interestingly, that proband did not come from a high-risk family, so he did not suspect the result, which is also of relevance to several of his nieces, who might carry the same mutation.
The study also found a number of variants of uncertain significance in the participants, for which it is currently impossible to determine whether they are associated with an elevated cancer risk. "In the absence of a family history, it's difficult to justify returning those to individual subjects," Biesecker said.
Because the probands were not selected from families with a high risk of cancer, the results give an unbiased view of how often cancer-predisposing mutations occur in the general population. But at the same time, this can make it more difficult for participants. "It's very surprising and unsettling for a patient to be sort of taken off the street, if you will, and have an assessment made of this susceptibility," Biesecker said. "The utility of such a finding is likely to be variable in these subjects because of their lack of experience with the trait, so we all will need to develop experience in how we ascertain and manage patients by this new approach."
ClinSeq has also turned up unexpected results in other probands. One example is a middle-aged male participant with an extremely high coronary calcium score — a proxy for atherosclerosis — and a family history of early-onset coronary atherosclerosis, with more than 10 affected family members who have no elevated blood lipid levels.
To determine what causes this phenotype, the ClinSeq researchers decided to sequence his entire genome. Based on read counts across his genome, they found that he has a deletion on chromosome 17 that includes the PMP22 gene, which is mutated in hereditary neuropathy with liability to pressure palsies, or HNPP. The participant chose to learn about this result and told the researchers later that he had had symptoms of the disease for years. After he informed his family, several family members were subsequently tested and turned out to have the same deletion, though most of them were undiagnosed with HNPP.
"While this may seem straightforward in its application from the perspective of a genomicist, clinically, this is a pretty radical thing to have done," Biesecker said, because the participant was diagnosed with a genetic disease without reporting a history of his symptoms first. The ClinSeq researchers have not yet been able to identify the cause of his calcified coronary arteries, however.
The example of this proband also illustrated how difficult it is to firmly link variants to diseases, in part because variants are misannotated in existing databases. Besides the deletion in his genome, the researchers also found 64 candidate disease-causing variants that were in the Human Gene Mutation Database. However, a close review of the primary literature underlying these annotations — a process that is "very time-consuming but absolutely necessary," according to Biesecker — suggested that 43 of these do not cause genetic disease but were misannotated. One mutation on the X chromosome, for example, had been reported to cause X-linked retinitis pigmentosum, a disease with an average onset of nine years, even though the participant had "perfect visual acuity."
"One has to be extremely careful when sifting through genomic results to avoid making this error," Biesecker said.
For another 17 variants, the human reference genome contained the disease-causing mutation, and the proband was wild-type, and for the remaining mutations, the candidate was heterozygous, meaning he is a disease carrier.
[ pagebreak ]
A 'Radical' Approach
Overall, Biesecker said, ClinSeq has found that genome sequencing is highly likely to generate medically relevant results, including cancer susceptibility, carrier status, and other traits. "When you interrogate genomes in subjects, you can be certain you are going to find clinically relevant variants, and you are probably going to find more than one of them in every patient you analyze," he said.
Analyzing genomes without a prior clinical indication is "radical," he said, because "it disrupts the paradigm of the clinical evaluation as currently practiced. We will certainly need new analytic approaches, and the databases have to be improved in order to allow us to interpret variants in a high-throughput fashion."
Right now, he explained, patients first see a clinician when they have already developed symptoms or a phenotype, allowing their doctor to search for causes. "But if you have whole-genome sequence data, you don't have to do it this way. You can change the order of your approach by thinking about interrogating patients first genomically," and then phenotyping them based on the genomic results. This approach could uncover diseases, "even disorders that you did not even know existed when you started," which, he said, "could allow new discoveries to be made that were not possible under the old paradigm."
In practice, the researchers need to explore which variants they return and which they don't, and how to communicate those results most effectively. "We feel strongly that returning nothing is difficult to justify," Biesecker said. "There are clearly medically important results that are included in these datasets that would change how people are managed, and it could save lives. We also feel strongly that returning everything is absurd; the scale of the data is overwhelming, and … returning results of hundreds of variants would be impossible.
"We have to distill these datasets down to some reasonable set of variants, and return the appropriate amount of data to each patient."
The scale of the data, he said, is "very challenging," especially for clinicians used to dealing with single-gene tests. While the cost of generating data by next-gen sequencing is falling fast, interpretation costs are coming down more slowly, and the field of biomedical informatics is going to be "very important to all of us as we solve these problems.,"
For example, it took a team of five researchers between 150 and 300 hours to analyze the 37 cancer predisposition genes in 258 ClinSeq subjects, and "there is no way that scales to all genes and huge numbers of participants," he said.
Better bioinformatics tools will help, but Biesecker cautioned that he is currently "not able to see a substitute for having a knowledgeable geneticist" who can go back to the primary literature to decide whether a particular variant is pathogenic or benign. "It takes judgment and experience to sift through those results and decide what we think. How we can automate that, I think, is a huge challenge."
Also, the project's current practice of returning results one variable at a time in person is "obviously not scalable," he said, and ClinSeq is thinking about testing "some alternative modes of communication" that would increase the throughput of delivering data to participants.
According to the NHGRI website, ClinSeq will enlist about 1,000 participants in total. Most of the current participants are Caucasian, highly educated, and have high incomes, but according to Facio, ClinSeq will soon start recruiting an underrepresented minority.
Have topics you'd like to see covered by Clinical Sequencing News? Contact the editor at jkarow [at] genomeweb [.] com.