By Monica Heger
Name: Karl Voelkerding
Position: Associate Professor of Pathology, University of Utah and
Medical Director for Advanced Technology, ARUP Laboratories
Experience and Education: Faculty in pathology, University of Utah, 2002 – present
Faculty in the department of pathology, University of Wisconsin, 1990 – 2000
Resident in clinical pathology, Rutgers University, 1988
MD, University of Cincinnatti, 1983
Next-generation sequencing is increasingly being considered for diagnostic purposes, and several commercial firms have begun to offer genetic tests based on high-throughout sequencing, such as Medomics' test for mitochondrial disease and Correlagen Diagnostics' test for cardiac disease.
Yet despite these advances, the technology faces many challenges before it finds widespread use in the clinic.
Karl Voelkerding, a clinician who has been evaluating next-generation sequencing technologies for use in diagnostics, earlier this month spoke at the Cambridge Healthtech Institute XGen Congress about the promises and challenges of utilizing the technology in a clinical setting.
Voelkerding has been developing a sequencing-based genetic test for hypertrophic cardiomyopathy — a genetic condition characterized by an abnormal thickening of the heart muscles. It occurs in one out of 500 people and is the most common cause of heart-related sudden death in people under 30 years of age.
In Sequence caught up with Voelkerding last week to talk about his work. The following is an edited transcript of the conversation.
Why is hypertrophic cardiomyopathy particularly amenable to using sequencing for diagnosis or to determine treatment?
Hypertrophic cardiomyopathy represents an example of how mutations in a number of different genes lead to a shared clinical presentation. And the question is, how do you search for those mutations when [there are multiple] genes that need to be examined to do a comprehensive diagnostic?
With hypertrophic cardiomyopathy, there are 10 or more genes that we need to examine for a comprehensive diagnostic. So the question becomes how to approach that technically, and that's where the advent of next-generation sequencing looks promising from the standpoint of the lower sequencing costs per base and the ability to put on the sequencing instrument a sequencing library that's comprised of 10 or so genes in a single sequencing lane or single sequencing plate.
So the idea is to have a multi-gene sequencing panel for hypertrophic cardiomyopathy that will be used as a tool to assist in determining in patients with a hypertrophic heart whether or not that is due to pathological mutations in the genes that are in the gene panel. That would provide the genetic basis of the patient's condition and allow the individual to understand more about why this happened to them.
Also, the identification of a specific mutation in a patient allows you to use it as a molecular marker to analyze for that specific mutation in at-risk family members, like children and siblings.
So it is both a tool to establish a genetic basis of hypertrophic cardiomyopathy and to use in counseling with respect to potential screening for the mutation in other at-risk family members. That's the value of it.
Hypertrophic cardiomyopathy represents just one of many clinical disorders where the genetic basis is increasingly being revealed and discovered, and is characterized by multiple genes that can have mutations throughout the genes, that lead to an overlapping clinical presentation. We're moving into [diseases] where the number of genes we want to examine are considerably higher than the 10 to 15 genes we want to examine for hypertrophic cardiomyopathy, like X-linked mental retardation. This is mental retardation that is primarily presenting in young boys, and it's between one generation and another so it has a pattern of inheritance, and the genetic lesions are present primarily on the X chromosome. And the challenge there is that the number of genes on the X chromosome that have been associated with mental retardation are between 80 and 90 genes.
When you reach the point where you need to interrogate or sequence on the order of 80 genes, you really need a high-throughput technology, and next-generation sequencing looks very attractive particularly for a disorder like that.
You've been testing both the 454 and the Illumina sequencing platforms for their diagnostic abilities in hypertrophic cardiomyopathy. Can you explain the tests you've done with those platforms and how they compare?
First, I would say that we had a substantial amount of concordance or agreement between the two platforms, which is a positive. Whenever we found a nucleotide variant that was in agreement between the two platforms, we were able to confirm that those variants were indeed real by using Sanger sequencing, which is the gold standard. That's point one. Point two is that there were variants identified by both platforms that were not in agreement between the two platforms.
The variants that were in disagreement that were generated on the Illumina platform were due to the fact that we utilized very short read lengths of 36 bases in length. And the short read lengths resulted in cross alignment between two closely related genes in our gene panel. That issue is very addressable by using either longer read lengths or using paired-end sequencing. So although we had two of those discordant [variants] in our Illumina data, we have a path forward to reduce or eliminate those.
Now, there were also discordant results with the Roche 454 technology. They were all due to a specific type of sequencing error that is well documented in the Roche 454 technology. That sequencing error occurs in stretches of DNA where the nucleotides are all the same, or so-termed homopolymer regions. In our study, we utilized a version of the 454 chemistry that is not as refined as the chemistry and technology is today. We may obtain better results with the newer chemistry and newer configuration platform and software analysis. That being said, I think we would improve our results, but I don't know if it would eliminate all the sequencing errors in the homopolymer regions.
Going forward, what sequencing strategies will you use?
The areas that we're really focused on this next year are developing [the] best and most cost-effective approaches to enrich for the genes of interest that we want to sequence for our diagnostic panels. We are in the process of evaluating both our own internal use of PCR as the amplification or enrichment method, and we are also interested in three technologies on the market: RainDance, Fluidigm, and Olink Genomics.
The other thing we're doing is continuing to evaluate different sequencing platforms. We'll continue our work with Illumina and Roche 454 technology, but we're also expanding our evaluation to other technologies that are currently on the market or coming onto the market. We are interested in evaluating [the sequencers being developed by] Life Technologies, Pacific Biosciences, and Ion Torrent.
What capture and enrichment did you use for your previous studies?
Our primary enrichment technology has been PCR-based, specifically long-range PCR.
And we've done some enrichment and sequencing work where the enrichment was performed on a solid surface array technology provided by Febit. But at this point we are not continuing to evaluate Febit, primarily because the array capture technologies are more characterized by co-capture of closely related gene sequences, which are then co-sequenced and pose more of a challenge from a bioinformatics standpoint.
But I'll put a qualifier on that — I think for clinical research and basic discovery research the array capture technologies are very powerful, and indeed we are using them for gene discovery projects. For research purposes, we will continue to utilize the array technologies, but their characteristic where they co-capture closely related gene sequences and unrelated genes poses a bioinformatics challenge, which creates an extra layer of complexity for consideration of their use in the diagnostic lab. That's why we're continuing to evaluate other enrichment technologies that are based on specific amplification such as the RainDance platform, and ones that have the ability to do multiple PCR reactions in parallel, such as the Fluidigm platform.
Can you talk about the importance of sequencing accuracy in diagnostics, and specifically with regards to hypertrophic cardiomyopathy?
The gene with the highest mutation frequency [in hypertrophic cardiomyopathy] occurs in a gene called Myh7, which encodes a myocin heavy chain protein. And on the same chromosome, adjacent to the Myh7 gene is a gene that's very closely related, called Myh6. That gene probably arose as a gene duplication of Myh7 and then has undergone some evolutionary divergence. However, it's very closely related at the sequence level. In many areas it's 100 percent identical over significant stretches of nucleotides.
But, that Myh6 gene has rarely been implicated in hypertrophic cardiomyopathy. So you want to be very certain that your diagnostic accuracy for the Myh7 gene is as correct as can be because it is the most frequently mutated gene in hypertrophic cardiomyopathy.
The presence of Myh6 can confound the diagnostics if one does an array capture approach, where the Myh6 sequences are co-captured. Or, if you use long-range PCR to amplify both, and they are sequenced on the same sequencing library in the same pool of genes, you have to perform your sequencing with longer reads and/or with paired-end reads, to make certain that during your alignment, you're specifically looking at Myh7 and not Myh6 sequences.
One of the great challenges with the hypertrophic cardiomyopathy locus is that the gene with the highest mutation frequency has this adjacent highly related gene that can interfere with the analysis. And there are many other examples in the human genome where you'll have a gene of high diagnostic interest and there will be very closely related genes that may be adjacent or on other chromosomes. So there's this genetic phenomena where you'll have a gene that will be functional and intact, and then there will be a copy of it that is non-functional either next door or somewhere else in the genome, referred to as a pseudogene, and the pseudogenes can cause quite a bit of diagnostic challenge, using next-generation sequencing approaches.
Aside from these pseudogenes, what are the major challenges in using next-generation sequencing for diagnostics?
I think the major challenges are that the technology is technically complex, both at the diagnostic bench and at the bioinformatics levels. With each year what we've been observing, though, is that the commercial developers of next-generation platforms have been striving to reduce [the] complexity of the entire process — both in reducing the amount of hands-on technical work that needs to be done prior to sequencing, and in trying to streamline their own data analysis pipelines.
In conjunction with that, you have a variety of other commercial groups that have been trying to improve the front-end processes and data-analysis processes. So now we have commercial third-party companies for data analysis that are greatly facilitating data analysis. And also, groups that are trying to improve and reduce the costs of the up-front sample preparative processes and the steps before sequencing.
So there's … the challenge, but the types of improvements that are happening are making the technologies more amenable to translation into the clinical diagnostic arena.
Another challenge is that platforms continue to change and new platforms are coming into the commercial space, so trying to choose a platform or platforms to use in the clinical laboratory is challenging because of the rapid evolution of this technology.
To introduce a test at the clinical level it has to undergo a validation process. But, when the technology changes, when there's perhaps an entire new upgrade to an instrument or an entire new software pipeline, you have to do a re-validation to insure sure that you're still obtaining the types of results that you were initially obtaining. So, it's difficult for a clinical lab to choose and validate a technology.
What was the sequencing cost of the study you did for hypertrophic cardiomyopathy and what does the cost need to be for it to be used in a clinical setting?
For a single sample on our cardiomyopathy panel, the total cost, including enrichment by long-range PCR, making the sequencing library, and the actual sequencing was approximately $1,500 on the Illumina GA. On the Roche 454 technology, for a single sample the cost was in the neighborhood of approximately $5,000. That being said, I think the cost for the Roche 454 would be lower with today's pricing. They have done some downward adjustment of their reagent pricing costs.
Realistically, over the next couple of years, I would like to see those costs come down to the $500 range for total material costs. To achieve that we need good specific enrichment methods coupled with reductions in sequencing reagent costs. Because on top of that $500, one would need to add labor, which will be reduced as the technologies and sample preparative processes become more streamlined and/or automated. And the bioinformatics analyses are becoming more straightforward each year. Both our experience and that of others speaks to that.
What are the next steps of your research?
As I indicated earlier, the key things are that we'll continue to evaluate enrichment technologies and sequencing platforms. We want to expand out of a development phase and into a validation phase, which is predicated on choosing a platform or platforms for our diagnostics applications.
We are also going to develop diagnostic gene panels. And, we've already moved ahead and begun to do capture and sequencing of the entire human exome for research purposes, but I'm actually not at liberty to discuss that at any greater detail. But we'll be using that for clinical research and gene discovery.
Are you leaning toward any particular sequencing platform for diagnostics?
We continue to devote most of our efforts on the Illumine GA and our reasons for that are in addition to quality data that the platform provides, the sample prep and library prep is the most straightforward. And the cost of doing development work has been significantly lower on the Illumina GA, as you can appreciate from the numbers I quoted you earlier.
It also has a higher-throughput capacity that allows for greater and deeper coverage per sample per lane, and also the capability to allow us to do work on whole-exome sequencing for clinical research purposes. So, as a platform, it has turned out to be versatile for us and our different project areas from multi-gene panels to exome sequencing.
What are your thoughts on the future of using next-generation sequencing in a clinical setting? And do you have predictions for when we'll start seeing this being used in a widespread clinical setting?
It's the very early stages, but there's already at least two commercial groups employing next-generation sequencing for targeted re-sequencing of multiple genes and gene panels. And we are in active development with a goal of translation to clinical application and validation. So I think over the next one to two years you will see increasing growth in this area and it will reside initially in diagnostic reference laboratories, which perform high-complexity genetic testing.
As the technology costs continue to decline and the platform options diversify, which will increase competition in the field and bring pricing down, we should see greater dissemination and more laboratories beginning to use next-generation sequencing.
One of the important challenges is the fact that it's a steep learning curve to enter into the world of next-generation sequencing due to its technical complexity and the amount of bioinformatics expertise that's needed to have the data be translatable into accurate sequencing calls. So I think we are still in the early stages but we are starting to see an active transition and a lot of interest throughout the genetic diagnostic community.