Skip to main content
Premium Trial:

Request an Annual Quote

Q&A: Jonathan Berg on 'Binning' Variants found During Clinical Whole-Genome Sequencing


berg.jpgName: Jonathan Berg
Title: Assistant professor, Department of Genetics, University of North Carolina at Chapel Hill;
Associate Director, Carolina Center for Genome Sciences
Experience and Education:
Postdoctoral researcher, Baylor College of Medicine, 2005-2009
Residency in medical genetics, Baylor College of Medicine, 2003-2007
MD, University of North Carolina, Chapel Hill, 2003
PhD in neuroscience, University of North Carolina, Chapel Hill, 2001
BS in biology, Emory University, 1994

As whole-genome sequencing is becoming more affordable and various research groups are reporting success in using the technology to diagnose genetic diseases, scientists are starting to think about how to implement this tool more widely in the clinic.

In a recent commentary in Genetics in Medicine, Jonathan Berg, a professor of genetics and a clinical geneticist at the University of North Carolina at Chapel Hill, along with his co-authors — UNC Chapel Hill colleague James Evans and Muin Khoury of the Office of Public Health Genomics at the Centers for Disease Control and Prevention — outlined a binning scheme for classifying variants that are found incidentally during the process in order to decide which ones to report back the patient. Last week, Clinical Sequencing News spoke with Berg about this idea and how it could be implemented. Below is an edited transcript of the conversation.

When do you expect whole-genome sequencing to become an important tool in healthcare?

The simple answer is it's already happening. There have been some papers published on patients who have had a whole-genome or whole-exome sequence done because of a clinical indication, and we have obtained a whole-genome sequence for one patient of ours. So the reality is it's being done, though it's certainly not commonplace at this point, mostly because of the logistical aspects of analyzing the data. But the cost of genome sequencing coming down so rapidly is going to really force the issue. Third-party payors are going to start recognizing that they can get a whole lot more bang for their buck in terms of being able to do multiplex genetic testing at much lower cost than would be available through a standard Sanger sequencing panel approach.

In your paper, you distinguish between the diagnostic use of whole-genome sequencing to uncover the cause of a genetic disease and its use as a public health tool in currently asymptomatic individuals. Can you elaborate on that?

I'm a clinical geneticist, I see patients in the clinic, and we often order genetic tests for our patients. We are painfully aware of how often we suspect a condition, order a genetic test, and it comes back negative, essentially leaving us with, 'Perhaps we just need to order another genetic test.' You can spend a lot of time and effort doing that when, if there were a test like whole-exome sequencing available clinically, you would essentially be looking at all genes simultaneously. So I think [this would be useful] for people where there is a Mendelian condition, perhaps most importantly for Mendelian conditions where there is genetic heterogeneity, things like long QT syndrome or cardiomyopathies, where there are these huge panels of genes that can be tested.

Even hereditary cancer susceptibility, where we have a few genes that are clinically available, and there are certainly a large number of minor players, which probably have significant effects within a family if there is a mutation but don't account for very much of the overall hereditary cancer susceptibility, but they just aren't really testable clinically. Even in patients with symptoms like epilepsy — children or infants with epilepsy — there are numerous genes that could account for that. Deafness is a great example. So there are all of these different clinical examples where you could imagine that a single test could essentially allow you to identify mutations in the known genes [for that disease]. And we consider the diagnostic evaluation to really be limited to the genes that we know to be associated with that phenotype that the patient is presenting with.

Looking forward to the time when whole-exome sequencing or whole-genome sequencing is just a routine test, what we think it could be potentially useful for in a public health context would be identifying the rare individuals with Mendelian conditions that we can do something about, and prevent morbidity and mortality. The example that we gave in the article was Lynch syndrome, which I think is fairly well documented that if you can identify an asymptomatic person with Lynch syndrome, which we currently do based on their family history — we identify these people because they are in a family of people with Lynch syndrome and we can determine that they are carrying the familial mutation — that it is reducing morbidity and mortality. Breast cancer susceptibility — again, I am talking about the Mendelian forms of this disease, the high-penetrance mutations — we have clear evidence of the ability to improve outcomes. Those are the cases, though they might account for only 1 in 1,000 people, or perhaps, if you include things like long QT syndrome and … other genetic conditions, possibly 1 in 100 people who have one of these relatively highly penetrant mutations.

That seems like a high number.

That's really a ballpark number. We will find out a lot more about these types of genetic conditions. I think likely one of the things we are going to find out is that what we think are highly penetrant conditions, because we have been studying them in families where they have proven to be highly penetrant, when you start doing this type of testing in unaffected people without a family history, we may find genetic changes that you would have suspected to be highly disease causing, but the penetrance is not as high as we thought it was. We are going to be refining this as we go. In a way, you don't really know that until you have done the experiment, as it were.

One application of whole-genome sequencing you did not mention in your paper is tumor sequencing. Why is that?

Our commentary was entirely focused on germline testing because that is really our area of focus. But we have colleagues in the cancer world who are learning a lot about tumor genomics and somatic mutations. One could make a very strong argument that some form of tumor sequencing, whether it's targeted sequencing for known mutations or whether it's complete genome sequencing, is going to be coming online as standard of care. It will need to be studied, and I think there is a good infrastructure for studying these things in oncology, and I think that it will be demonstrated that it's actually beneficial to do this. But we did not address that because it's not within the scope of our article.

In your article, you propose a binning scheme for determining which genomic variants should be reported to patients and which should not. Can you outline this?

At the outset, I want to make it clear that the binning we have proposed is really just a suggested outline. We envision this as a way to structure the analysis of incidental findings in the genomic sequence data. This essentially occurs after the diagnostic evaluation has been completed, so after we have looked at all of the genes that are associated with a patient's phenotype, and we have either found a disease-causing mutation or we haven't found a disease-causing mutation. We would then say [that] you [should] look at everything else in the context of this binning structure.

And the reason for it is that there is just too much data for a single person to look at and try to analyze in any kind of efficient way. You need to take, at least we feel, this a priori structured approach that divides the genome up in these bins. The binning idea came from Jim Evans, my colleague and co-author on this paper, and the bins are really defined by the clinical utility of the information. If the finding of abnormalities in this gene is clinically actionable, it would be bin 1. If the variants are clinically valid, and clinically useful, but not something that you necessarily have to act upon, they would be in bin 2. If they are things that we absolutely have no idea how to interpret, they would be bin 3.

Within bin 1, these are going to be the rare Mendelian disorders, where we have clear guidelines for what to do in people who have those conditions. The examples we gave in the paper are things like hereditary breast cancer genes, or hereditary Lynch syndrome genes. Other examples would be the genes associated with hereditary aneurisms. If you find someone has a clearly deleterious mutation in one of those genes, you would want to be screening them with echocardiograms, and that's a really clear guideline. It would be things where if we find it, we have to recommend that we do something about it.

[ pagebreak ]

How many genes are in this category as of today?

It's a good question. The answer is no one really knows at the end of the day what genes ought to be in that category. I think it's going to be a process that our entire field is going to have to grapple with and decide on and generate some consensus about. If this binning scheme turns out to be successful and people think we ought to develop it further, it's going to take national and international panels of experts to sit down and grapple with it and say, 'What side of the line does this gene belong on?' There are going to be certain genes where it's quite easy to see what bin they should belong to. There are going to be other genes that are right there on the line between the bins, and some experts are going to say it ought to be in bin 1, and some experts will say it ought to be in bin 2. And those are going to be the really interesting questions, those areas where there is conflict in terms of what people think ought to be done with those genes. Our little estimates of how many genes would be in those bins are really just thoughts and not really based on a careful analysis yet.

Can you describe the content of bin 2?

These are things that we have some clinical validity and scientific evidence behind but that don't necessarily evoke an actionable response clinically. We separated these into three categories. They are essentially based on the risk of knowledge, the risk of learning something in those bins. For example, the bin 2A category are pharmacogenomic markers and GWAS-type risk SNPs, the types of things you would find in many of the direct-to-consumer profiling studies; they clearly are not causing a lot of anxiety for people to learn about them, and they don’t necessarily have a lot of clinical utility yet because we really don't know how to combine risks into any sensible risk calculation yet. Even if you did believe that a combination of SNPs increased your risk for diabetes two-fold, there are no clear guidelines that we would want to do anything different than what we already recommend, in terms of living a healthy lifestyle and other recommendations. Those types of things we view as having personal utility, but we don't feel like they would be clinically actionable yet. There is room to move between these categories, clearly.

Bin 2B are other Mendelian disorders for which we don't necessarily have a treatment but that are not terribly horrifying diseases and might be useful to know about. You might consider carrier status for recessive conditions in that category, that people who are of reproductive age might be interested in. We put APOE4 allele status in that category, too, partly because it's a bit more predictive than many of the GWAS risk alleles, and it's a little bit more risky knowledge to have in terms of a person's possibility of life insurance discrimination or long-term care insurance discrimination. So it's a little bit more risky information that we would think as genetics professionals that people ought to think carefully about before deciding to learn that information, for example.

And then the third category within bin 2, bin 2C, are what we consider to be the very, very small number of conditions where learning that you are going to be affected with that condition could be utterly devastating. The examples would be things like Huntington's disease, the prion disorders, and early forms of dementia and neurodegenerative conditions — conditions where we really don't have anything to offer in terms of therapy, prevention, or delaying onset. There is clear evidence from the clinical genetics literature that people who have a family history of those conditions often choose not to learn if they are at risk. And so if people would make a rational decision not to learn that information, then we think that it ought to be in that sort of protected category, where in order to gain access to that information, you really ought to get specialized counseling about what that information might mean to you.

In the gradation of risk, we also think there ought to be a gradation of how those results ought to be communicated to patients. In the future, when you see every patient going to their primary care physician and getting their whole-exome sequence done, we would envision that in order to reveal the results of certain categories of information, there ought to be a genetics professional involved in the delivery. And likewise, in the pre-test counseling about what this test means, patients ought to be given information about the types of potential information that they could receive. They need to know upfront about what those categories of information might be. So we kind of view the binning as a way to communicate with patients about what types of information there is in the genome and allow the patients, then, to select things in a categorical manner. Someone could say, 'I would really like to know my bin 2A information, including the pharmacogenomics results, because I think it might be useful in case I ever get put on a drug [for which this information is relevant], but I'm really not eager to know my bin 2C information,' for example. Essentially, that process then becomes driven by the patient's interest and delivered to them in a manner that is appropriate for that type of information.

But bin 1 information you would always report, regardless of the patient's wishes, right?

That's right. If you get a whole-exome sequence, and you happen to have one of those variants, you need to be told about it because we can do something to help you. Bin 1 people would not get a chance to not know about. It would be sort of equivalent to getting a brain MRI done for something and then learning that there is a mass there. It's not information that you wanted to have but, clinically speaking, it was necessary for your doctor to tell you about it.

I think it's a really important conceptual point that the genes that are in the bins are not static. Things that are currently in bin 1 are likely to stay in bin 1. But things that are in bin 2, as we get proven treatments for them, or proven preventive measures for them, would migrate into bin 1. So even though we can't do anything about your risk for Alzheimer's if you have APOE4 alleles, in the future, if there were medications to delay the onset of symptoms, then clearly, that would become bin 1 information. If we are not able to do anything about the progression of these neurodegenerative conditions in bin 2C, but in the future, we have the ability to offset those symptoms, or prevent or delay the onset, then those things would go into bin 1. It's going to be a continually refined process, and again, something that panels of experts are going to have to grapple with, and probably even representatives from the general population will help with that.

It would also mean that patients would review their genomes again and again?

That's right. We would envision that the analysis that you have done at time zero is going to tell you what we know as of that time. That information might change in a year, and you would want to have a mechanism for the laboratory staff to be able to reanalyze and to be compensated for reanalyzing people's data, which is not really present. So [we need to think about] how are we going to reimburse people for that cognitive service, and for the cognitive services of the genetics professionals, the clinical geneticists, and genetic counselors who are going to be needed to review this information with the patients. Those are going to be important things that we will have to deal with at a societal level.

What are the greatest challenges you see for making whole-genome sequencing become generally adopted as a clinical tool?

One of the really important things is going to be the standardization of the laboratory aspects of it, and getting a full understanding of the limitations of whole-genome sequencing. We have a pretty good sense at this point about the sensitivity of Sanger sequencing, and we will really need to get good estimates of the sensitivity of whole-genome or whole-exome sequencing. Understanding what types of mutations can be detected reliably and what types of mutations cannot be detected reliably is going to be an important facet of this. It's kind of boring, from a scientific standpoint, but really important for a validation of this as a clinical test. Think, for example, about triple repeat disorders. Can we even reliably measure the length of a repeat in a whole-genome sequence? I don't think we know the answer to that yet. So understanding the technical limitations is going to be important from the laboratory standpoint.

[ pagebreak ]

Another major thing that we have to grapple with is essentially what our binning scheme is intended to address, which is, the bioinformatics and interpretive part of understanding what the significance of these genomic variants is. And being able to do this in a streamlined fashion, so it doesn't take, for example — as has been published — six to eight hours on multiple clinic visits to be able to do consenting for this process. I think it's correct, and reassuring, that people who are doing whole-genome and whole-exome sequencing clinically are expending that type of effort to make sure that patients understand, as we are just starting to roll this out, but eventually, we want it to be a much more streamlined process. Otherwise, you just can't imagine having enough time in the world to do whole-genome sequencing for all patients that walk through the door.

I think the clinical workforce is not prepared for genome-scale testing on a widespread level. The current clinical genetics workforce could have probably coped with doing, essentially, whole-exome or whole-genome sequencing as a replacement for the current tests that we would order for our patients who are referred to us for genetic disorders. I think that other specialties, like neurologists and cardiologists, would happily transition to whole exome, too. But I think that the problem is that our medical workforce in general is really not adequately trained to do the pre-test and post-test counseling for something that is genome scale, to understand the limitations and the risks of the test and so forth. We either need to increase the support of clinical genetics training as a residency, so that we have a workforce of physicians who is capable — and enough of them — to handle this load, or we need to massively increase the number of genetic counselors, or both, preferably, so that the clinical workforce is there to meet the demand that will be coming down the line. I'm not the first person to say that — there is an inadequate amount of people who are adequately trained in genetics.

Wouldn't that take a long time?

Anything takes time. But when there was a lack of anesthesiologists, and it became a crisis, that problem got fixed in a hurry. It's not like there isn't precedent for rapidly ramping up training for specialists in certain areas — it certainly can be done. It would take commitment on the part of the medical establishment to make that happen. But if it's important, and if genomics is an important part of clinical care, and if we want to do it right, then I think that that is a necessary step.

Another thing you mention in your article is the need to develop better curated databases.

There have been a lot of recent papers talking about different ways to accumulate what is kind of fragmented data right now. You have all of these labs who have been doing really good work in genetic testing for specific genes, and they have got a wealth of data in their locus-specific clinical lab databases. You have things like the Human Gene Mutation Database who have expended a lot of effort in curating these things, although with some publicized errors as well. The data in a database is always going to be as good as the people putting it there. That's going to be an important part of this, developing the databases of known mutations and even, perhaps more importantly, known benign variants.

It's going to be a tough process, because on the one hand, everybody wants to get this freely accessible database that they won't have to contribute to or pay for. And then you have these labs who have been doing testing saying, 'Wait a second, this is really valuable information that we have been accumulating over the last however many years; how are we going to get recognized for or compensated for sharing this information?' I don't know if there is a really easy solution to it. Again, it's the type of thing where the willpower of the medical system is going to have to decide that in order to do this right, we need to have a centralized database, we need to come up with some standards for what gets put in it, who is allowed to put things in it, who is allowed to use it, under what circumstances, and how we are going to fund it. I think it's going to be really important to come up with a model that works for everyone, not just to say, 'Let's have this centralized database and let it be out of the control of people who are really using it.' That's a big challenge.

To implement this binning scheme, you propose what you call 'a broad coalition of experts' to convene. Is anything like this already happening?

There is not a concrete group that is assembled yet. We are certainly in the process of moving forward with that. Our group actually submitted a grant for a [National Human Genome Research Institute] clinical sequencing center, and as part of our grant, we proposed to organize an international binning committee to be able to start to address that question.

One of the specific aims of our grant is to put together experts and task them with deciding on the binning scheme and study that process from almost an anthropological point of view, on how the experts agreed and what the areas of consensus and the areas of conflict were. If we get funded, we could potentially be part of that organization.

One of our co-authors on this article, Muin Khoury, is from the CDC, and has a track record in the area of taking a look at what genetic tests ought to be used and under what circumstances, and he is also interested in potentially convening something around the infrastructure that he has already built up to look at whole-exome and whole-genome sequencing clinically, and how you would implement it.

Is there anything you would like to add?

Another point I would like to make is that with the huge number of variants that will be detected in any person's whole-genome sequence — or whole-exome sequence for that matter — the vast majority of those variants will have absolutely no clinical significance whatsoever. For this reason, we think it will be critical to set a very high bar for the reporting of variants in order to avoid overwhelming physicians and patients with information that is essentially irrelevant to their medical care, and so that those few variants that are actually useful do not end up getting overlooked in the mass of data. This should be a guiding principal as we move forward with clinical applications of genome sequencing.

The Scan

Mosquitos Genetically Modified to Prevent Malaria Spread

A gene drive approach could be used to render mosquitos unable to spread malaria, researchers report in Science Advances.

Gut Microbiomes Allow Bears to Grow to Similar Sizes Despite Differing Diets

Researchers in Scientific Reports find that the makeup of brown bears' gut microbiomes allows them to reach similar sizes even when feasting on different foods.

Finding Safe Harbor in the Human Genome

In Genome Biology, researchers present a new approach to identify genomic safe harbors where transgenes can be expressed without affecting host cell function.

New Data Point to Nuanced Relationship Between Major Depression, Bipolar Disorder

Lund University researchers in JAMA Psychiatry uncover overlapping genetic liabilities for major depression and bipolar disorder.