Skip to main content
Premium Trial:

Request an Annual Quote

Q&A: Lynn Bry on Why a Clinically Annotated Database of Variants is Needed for Diagnostic Sequencing

Premium

Name: Lynn Bry
Title: Associate director, Partners HealthCare Center for Personalized Genetic Medicine;
Board-certified pathologist, Brigham and Women's Hospital;
Associate director of clinical laboratories, Brigham and Women's Hospital;
Assistant professor, Harvard Medical School
Education: MD and PhD, Washington University Medical School;
Fellow of the College of American Pathologists

Last month, the US Food and Drug Administration hosted a one-day workshop to discuss next-generation sequencing in clinical diagnostic applications. While the morning session focused primarily on how specific platforms could be assessed in terms of error rate, accuracy, and coverage (CSN 6/29/2011), the afternoon session was devoted to bioinformatics.

While a number of issues need to be considered on the bioinformatics side for analyzing next-gen sequencing data for clinical purposes, one particular issue that emerged was the need for a clinically annotated database of variants that incorporates a confidence level for calling a specific variant pathogenic, as well as the evidence used to make that call — whether it is from clinical trials, in vitro models, or other sources.

Lynn Bry, associate director at the Partners HealthCare Center for Personalized Genetic Medicine, was a panel member at the meeting, and spoke about the need for such a database.

At Partners Laboratory for Molecular Medicine, she said, researchers have developed their own curated database of variants, which they use in their CLIA laboratory for sequencing-based diagnostic tests for cardiomyopathy and deafness.

Recently, Bry spoke to Clinical Sequencing News, elaborating on the need for such a database.

At last month's FDA meeting, you participated in the panel discussion about bioinformatics and spoke about the need for a clinically annotated database of variants. Can you elaborate what you meant by that, and explain what the problem is with current databases?

We're drawing upon the primary literature to try and figure out which SNPs or variants are significant, and when you go into the literature, things aren't always graded clearly. If you go into [the National Center for Biotechnology Information website] and search for gene variants, there are no good coding standards to say, 'This variant reported was actually found to be normal, or was found to be pathogenic, or was found to be likely pathogenic, in this particular population.' So the databases we're using to mine the raw information could be better structured to support clinical activities.

The ways that we can update and improve validation of these variants often take a lot of time and effort because you may have in vitro systems all the way through animal models to validate, but there are different sources of information you use to validate that this is likely a pathogenic, as opposed to a benign, variant.

For instance, if you have a pedigree and you see that a lot of family members have this variant that get this disease, that increases your suspicion that it's clinically relevant. Also, if you have additional research-based data from in vitro or animal models, that can help. What's best is to have clinical trials where you assess outcomes based on the variant, but we don't always have those. So there are multiple sources of information from which you can draw to assess the clinical utility of the variant, but it takes a lot of time and effort because the raw materials we're working with haven't been optimized to use for clinical purposes.

What are you doing in your lab? And what would an ideal database look like?

When we do next-gen sequencing for clinical variants for diseases such as cardiomyopathy or deafness, the sequence is taken and compared against a curated database of variants. That's to help make calls of variants that are of significance, ones that are impacting or likely causing the patient's disease and clinical manifestations.

We're fortunate that we have a team here run by Sandy Aronson who has developed applications to support [our curated database]. Heidi Rhem, the director of the lab for molecular medicine, has been deeply involved in getting these things into productive clinical use.

But it's incredibly manual to mine the information, curate it, and validate it. When you go to the raw sources of information, whether it's dbSNP, or NCBI, or the raw literature, you have to sift through a lot to figure out what's going on. It's not like you can go to dbSNP and say, 'Show me all these variants that have been validated to this particular degree or with this particular information.' You yourself have to go in and do that background [yourself] to make the assessment.

This is very early days, so I think these things will come along, but the technology and requirements are so new that the preexisting infrastructure — it's great that it's there and that we do actually have something to work with — clearly hasn't been optimized for things we know we're going to need to do.

[ pagebreak ]

What does this suggest about some of the sequencing-based tests being developed? Are they using faulty data?

Well, that's sort of the question. What are they using to make the calls? When a lab reports this variant significant, it would really help to know what information they're using to make that call.

To me, making a call based solely on in vitro data or cell lines could be very circumspect data, versus a call being made on [data from] five family pedigrees where we see the same mutation in this particular phenotype.

We often don't know what databases are being used to make the calls.

It would also help to know what the labs are doing technically for the sequencing because there are components, depending on how you align or assemble the genome, that can introduce potential mistakes that might be called as variants. So, again, this is very early days, and these things will be worked out, but these are the sorts of issues we're working on.

To what degree should the FDA or another regulatory agency be involved in developing — or developing regulatory guidance for — a clinically annotated database?

I have to say it was a good meeting. The FDA seemed to be saying, 'We'd like the community to come forward with standards or approaches and then to work with the FDA,' which I think is the right approach.

I think somebody, whether it's the FDA or another entity, jumping in and saying, 'We're going to regulate now,' as we're still trying to figure out what we're doing and how to develop this, could hurt more than it would necessarily help.

But you could certainly see involvement [from the FDA or other regulatory agencies] in validating the platforms, the reagents for the technical components, as well as providing standards and guidelines for how you should do the bioinformatics — the interpretations and the reporting.

For instance, we have an application that we developed called GeneInsight. It's used here and at a number of other centers. This is [classified as an exempt class I medical device]. We knew it was going to be important, so we wanted to put it through the rigors of being able to stand up to FDA review.

What is GeneInsight?

It's an application that includes our curated database for interpretations, and it also facilitates the reporting. Both for the report the lab gets out as well as interfacing or forwarding that information into a hospital information system or an electronic health record.

Have you compared your curated database to dbSNP or other databases and do you get very different results from using this database?

We're actually drawing from sources such as dbSNP. [O]ften the analysis of what do we do with it is this clinical-grade information. We draw from dbSNP, we draw from NCBI, we draw from the primary published literature, and based on analyses by qualified medical geneticists and other folks, that's what populates our clinical database for calling variants.

One hope is by having, in this case, a number of institutions on the same platform, we can start to share information. It doesn't have to be in our specific application. I certainly welcome an open standard that would be available to anyone or any CLIA laboratory doing next-gen sequencing to contribute clinical-grade content.

If you go back 10 years, we faced many of these similar issues with HIV genotyping. It's a virus, so a much simpler genome, although the mutability of the virus introduces some complications with how you interpret it. What really helped nail down the diagnostic testing for HIV genotyping was to have a publicly available clinical-grade database with mutation information and how you'd use that to call resistance to particular drugs.

I feel that having something comparable at this point for the human genome would facilitate utilization of next-gen sequencing whole-genome analysis for clinical purposes.

What are the other major challenges you see in bringing next-gen sequencing into a clinical setting?

Some of it is the bioinformatics pipelines. If you look at sequencing centers, what we do in research is not what we're going to do clinically. In research, you may tweak things, change things for a particular scientific question, but for clinical testing we have to be consistent.

[ pagebreak ]

It is as important to know what we are getting as what we are not getting. What could we be missing? What are our error rates? You can have a little bit of wiggle with research questions, but with clinical testing you can't do that. You have to be consistent from assay to assay and you have to have a very consistent bioinformatics pipeline where you know what it is doing and what it might be missing.

Invariably, each sequencing center is developing its own pipeline simply because we can't go to a vendor and say, 'Give us an off-the-shelf product that we can use for clinical testing.' This is a little bit to be expected, but sites spend a lot of time, effort, and money to develop these pipelines for clinical use and they have to maintain them.

Every time something changes in the sequencing platform, or with the chemistry, or with the tool we may be using in our bioinformatics pipeline, it's incredibly time intensive and expensive to maintain it. In fact, I predict that next-gen sequencing — the platform and reagent cost — will probably be less than $1,000 a genome soon. But what's going to keep it expensive is the cost of our bioinformatics pipeline and how we maintain our data.

I think the biggest improvements in next-gen sequencing and using it for particular clinical use [will] be improvements in having a standardized bioinformatics pipeline, having available vendor-supported tools, and having close integration between the platforms, the chemistry, and the bioinformatics regarding the data we're getting out the other end.

You said most labs are developing their own bioinformatics for clinical use. Why haven't vendors started developing these tools?

It is still pretty new. We're taking off-the-shelf, open source tools, often developed by other genome centers whether it's the Broad, Craig Venter, Wash U, or elsewhere.

When you look at the in vitro diagnostic companies, whether it's Roche or it's Beckman Coulter, or some of the others, they don't deal with next-gen sequencing in a clinical situation. Roche does a little bit with their 454 platform.

There's certainly an opportunity for these vendors or even a company like Illumina to look to develop clinical-grade tools. I think there's also an opportunity for the [laboratory information system] vendors, whether it's Sunquest, or Cerner, or Soft, to have modules in a clinical lab. An information system that may not do the bioinformatics, but can certainly facilitate the interpretation and the reporting.

What were your thoughts on the FDA meeting, and did you think it seemed promising for moving forward?

My take was [that] people had a lot of different expectations for the meeting. As I learned more about it, it seemed the FDA just wanted to get a sense of what was going on. If you were looking at it as the FDA is poised to make a decision, I'm not certain we arrived at any decision. But if you look at it from the standpoint [that] the FDA just wanted to see what was going on, I thought it was good for that.

In the bioinformatics session, I think you could see there is still very much an academic research focus. People are doing a lot of work to understand the mechanics of how things work and to get tools out for research applications. This is not at all surprising, and it's great to have so many people working on these complicated questions. In the next couple of years I hope we have the same enthusiasm and intellectual capital applied to get [tools] developed into robust clinical pipelines because that's really going to be a major driver for increased clinical use and adoption.

A couple people mentioned at the meeting that without clear FDA guidelines or regulation, current sequencing-based laboratory-developed tests could pose a risk to patients. Is that something you agree with?

I think there's a whole host of issues around how FDA is going to treat LDTs. I sometimes jokingly call [next-gen sequencing] the mother of all LDTs, because it's such a new thing, it's complicated, and we simply don't have the vendor support to have FDA-approved kits and platforms at this time. I think they will come along, but just at this point in history, we don't have it. We need to move toward it.

I'd say one of the things FDA could do, and people at FDA have actually suggested this, is to ensure that [good manufacturing practice] processes are followed by the vendors in terms of the production of their platforms and their clinical kits and reagents, or reagents that are used for clinical purposes. That's not the same thing as CLIA or an FDA 510(k) application but GMP processes are a very good start.

I think also for folks doing software development, GMP is a good thing to follow. For individual clinical centers this can easily double the development team they will need, but I think it also provides impetus to software vendors to see this as an opportunity where they could get involved and work toward developing pipelines. [For example], you [could] get a license to the pipeline and maybe pay some sort of subscription to get a clinical-grade database. That provides the business case for industry to get involved in ways that would be constructive.

Is there anything else you'd like to add?

It's early days. There are some things FDA could do now that I think would help and not slow or hinder further progress. As things move along, we may have FDA-approved platforms, FDA-approved reagents to do the sequencing, and FDA-approved applications to help with the bioinformatics pipeline.


Have topics you'd like to see covered by Clinical Sequencing News? Contact the editor at mheger [at] genomeweb [.] com.

The Scan

Booster for At-Risk

The New York Times reports that the US Food and Drug Administration has authorized a third dose of the Pfizer-BioNTech SARS-CoV-2 vaccine for people over 65 or at increased risk.

Preprints OK to Mention Again

Nature News reports the Australian Research Council has changed its new policy and now allows preprints to be cited in grant applications.

Hundreds of Millions More to Share

The US plans to purchase and donate 500 million additional SARS-CoV-2 vaccine doses, according to the Washington Post.

Nature Papers Examine Molecular Program Differences Influencing Neural Cells, Population History of Polynesia

In Nature this week: changes in molecular program during embryonic development leads to different neural cell types, and more.