Skip to main content
Premium Trial:

Request an Annual Quote

Drug and Diagnostic Industries Warm to Sharing Genomic Data, With Some Caveats and Challenges


NEW YORK (GenomeWeb) – When AstraZeneca announced earlier this year that it will analyze the genomes of 2 million patients to help inform its drug discovery research, it stressed that it will share the insights generated from sequencing these samples and will only seek intellectual property protection on drug compounds, not on any variant data or drug targets that result from the effort.

This commitment to openness underscores a sea change taking place within the life science industry as it balances the competitive value of proprietary data against the realization that the complexity of the genome — and the exponential rate at which new genomic data is being generated — requires an analytical effort beyond what any one company or project can provide.  

"Today's world is quite different from the past. The pace of discovery is amazing," Carl Barrett, VP of translational science within AstraZeneca's Oncology Innovative Medicine division, told GenomeWeb. "We'll probably do more [whole-genome and exome sequencing] this year than we've done in all of history, time, and science."

In light of this, executives at drug and diagnostic companies today generally agree that the field can gain a more complete understanding of how genetics impact health faster by sharing information on variants. But because greater openness must be balanced against profit-driven interests, industry players diverge on exactly when and how this data should be shared.

In March this year, genomics experts and pharma execs convened in Washington, DC, to discuss how data from large genetic studies could lead to more drug development successes. From 2003 to 2011, only one in 10 drugs made it through the development process and regulatory approval. Some at the meeting wondered whether greater precompetitive data sharing around the basic biology of drug targets could improve these odds.

"We're tripping over each other and making the same mistakes, doing the same experiments, reproducing failure," said Lon Cardon, senior VP of alternative discovery and development at GlaxoSmithKline. "I think it's more efficient from a societal and patient perspective to think of that part of the game as something we could share."

GSK considers the drug targets and their biological annotation to be precompetitive information. Two years ago, GSK invested $1.7 million to launch a center at the Wellcome Trust Genome Campus near Cambridge where scientists from industry and academia could apply genetics and bioinformatics approaches to better define disease biology and validate drug targets. This year, after Biogen decided to contribute to these precompetitive efforts, "the phone is starting to ring and others are starting to join," said Cardon. He acknowledged though that not everyone in the drug industry is thinking this way.

Kari Stefansson, the head of Decode Genetics, a company focused on using genomic insights to inform drug development, said at the meeting that it would be commercially devastating to make drug targets precompetitive. "That's where we should be competing. That's where it's going to be exciting. That's where we're going to throw elbows," he said.

This kicked off a spirited debate on where the competition and innovation truly lies in drug development, and the types of information pharma would benefit from by sharing precompetitively. Some at the meeting thought that the association between a genetic variant and a phenotype — which can be critical to elucidating the disease biology a drug might target — should be shared since it is knowledge that the drugmaker distills from patients and can't ever fully own. There was a rush to patent genes 20 years ago, another expert pointed out, but it turned out not to be really worth it.

Stefansson explained at the meeting that he wasn't advocating patenting genes. "All of this comes into the public domain in the end, and we publish everything we do," he said of the genetic discoveries made by Decode, which are fueled by data from 500,000 study participants. "But there is this little time between when we discover it and until it's published, when it's privileged information.” Without this competitive edge in the discovery phase, he wondered, wouldn't pharma companies be little more than sales organizations?

Two months after that meeting, a research team led by Stefansson reported in the New England Journal of Medicine the discovery of two rare loss-of-function variants in the ASGR1 gene associated with lower levels of non-high-density lipoprotein cholesterol and lower risk for heart disease. Amgen, which purchased Decode in 2012, is reportedly working on drugs based on this research.

Stefansson told GenomeWeb that Decode always publishes its genetic discoveries so not only Amgen can use it to prioritize and validate drug targets, but others can apply the knowledge, too. But he stood by his view that in drug development competition starts with target discovery. "In the discovery of the target, and in the definition of the target, lies a lot of the magic," Stefansson said. "I'm absolutely convinced that's true." Once the discovery has been made though, and the scientists get their credit, "and people begin to apply the discovery or the knowledge of the data to facilitate good healthcare, then I'm strongly in support of sharing of data," he said.

Researchers have traditionally made their genetic discoveries public through peer-reviewed journals. But the dropping cost and growing power of next-generation sequencing technologies means more and more people are getting genetically assessed, and this testing is revealing never-before-seen variants, which in turn is blurring the lines between discovery and clinical application.

"Often times when you find a variant you think is unique, you wait a short time and someone else will publish on the same variant," AstraZeneca's Barrett said. "So, it's very difficult to compete against the world in this space. Everyone is doing this and any individual group is going to be the minority."

Given the pace of genetic discoveries, moreover, journals are struggling to adjudicate via peer review whether variants are truly associated with disease and whether they are publishing replicable data. One solution that's gaining support among academics and industry players is sharing variant classifications through a centralized database, such as ClinVar. More than 500 labs and researchers have submitted data to ClinVar since the NIH launched this publicly accessible archive of genotype and phenotype data three years ago. Payors, FDA, and peer-reviewed journals are also encouraging researchers and labs to contribute to ClinVar (see earlier story in series).

Of course, many haven't, perhaps because this type of data transparency requires a shift in thinking about competition. Is it best to keep information on variants within proprietary databases as a way to edge out competition in the short term? Or would making this information public prove more profitable in the long term? Regardless of their position, drug and diagnostic companies are betting that genomic information will be important to their ability to advance healthcare products, and they are making significant investments to improve their ability to classify variants.

Collaborating classifications

While there is disagreement among drug developers about exactly where to draw the line between precompetitive and competitive data, a cadre of genetic testing firms believe the life sciences field is better off sharing variant data rather than hoarding it.

"Variant classification should not be a way to compete with one another," said Jill Dolinsky, senior manager of clinical research at Ambry Genetics. "It's critical to patient care. It's critical to providing the best results possible."

Firms like Ambry are opening up their data coffers in order to pool knowledge on variants and improve the quality of classifications across the field. As of May, Ambry has made around 16,000 variant submissions into ClinVar.

The database has its detractors and one of the main criticisms has been that 17 percent of variants submitted to ClinVar by multiple labs have conflicting interpretations. This means that labs are sometimes classifying variants differently and patients in certain cases are receiving wrong results. But the goal of ClinVar, according to its backers, is to shed light on these discrepancies, bring transparency to the lab industry, and improve patient care.

Because of ClinVar, experts from competing labs are even putting their heads together to improve the state of the science. For example, Ambry, GeneDx, the Laboratory of Molecular Medicine at the Partners HealthCare Personalized Medicine, and the University of Chicago evaluated more than 6,000 variants that at least two labs had submitted to ClinVar, and found their classifications differed for 724 of them. The four labs worked for several months on 232 variants and reached a consensus on 86 percent.

"We're all trying to work toward having a better system in order to classify variants and make it better for the patient," said Lisa Vincent, a molecular geneticist at GeneDx, which has submitted more than 23,000 clinically interpreted variants as of June. "We can't move forward and figure out a better system if we can't figure out the mistakes we made in the past … Our goal is to make variant classification better and make that available to the public."

It's a really short-sighted view to say that the information we have is this special thing that we need to protect from one another.

Genetic testing labs have traditionally competed on the accuracy of their tests and the quality of their services, and so, the robustness of a lab's variant classification process has been a key differentiator. Labs that are sharing through ClinVar told GenomeWeb, however, that competition in the space is shifting toward things like turnaround time for results, client services, and pricing.

And competition is fierce. Healthcare technology firm NextGxDx estimates there are 60,000 genetic testing products on the market and on average 10 new tests launched daily. Even in an exceedingly crowded market, Scott Topper, head of clinical genomics at Invitae, noted that genetic testing labs are also shouldering the responsibility of establishing the utility of medical genetics, a field still in its early days. "It's a really short-sighted view to say that the information we have is this special thing that we need to protect from one another," he said.

Invitae aims to "disclose everything we know about genetics," added Topper. The firm has submitted nearly 8,500 clinically interpreted variants to ClinVar and is planning to deposit more soon.

When it comes to sharing variant data through ClinVar, one holdout among genetic testing labs is Myriad Genetics. Though it has submitted to an open access breast cancer mutation database in the past, the company has said it prefers now to advance knowledge on genomic variants through peer-reviewed publications as a way to protect patient privacy and because repositories like ClinVar contain inaccurate classifications.

Some of Myriad's competitors have criticized its stance and for keeping a proprietary database containing information on 46,000 genetic variants. But Johnathan Lancaster, chief medical officer at Myriad Genetic Laboratories, believes the firm has been unfairly vilified. It is standard operating procedure, he noted, for researchers at major universities and cancer centers to keep a lid on data before it is published or ahead of intellectual property protection.

Drug firms also don't give away their discoveries, Lancaster argued. “If you approached GlaxoSmithKline, or Pfizer, or Amgen, all of whom invest hundreds of millions of dollars in discovery to invent new drugs and new therapeutic approaches, and if you mandated tomorrow that every time you make a scientific discovery you must immediately put it into Wikipedia, what would they do? They would stop investing a single cent in research and development,” he said.

Publicly traded firms are obligated to make a profit and yield returns for investors, he said. “This is the American way and this is the way we work in the Western hemisphere, which is to say we reward our investment and effort and commercial gain in the anticipation it will stimulate innovation,” he said.

Lancaster estimated that Myriad has invested "hundreds of millions of dollars" into research, has put significant resources into building its database and variant classification tools, and has 30 PhDs dedicated to the task. “If you force a for-profit entity to give away any intellectual property or discovery the moment it identifies it, they will stop the discovery process," he said.

In the 1990s, Myriad won the race to clone and sequence the BRCA1 gene, mutations in which are associated with heightened risk for breast and ovarian cancer. Myriad's subsequent patents on BRCA1 and BRCA2 enabled it to be the only lab testing for these genes for nearly 20 years, until the US Supreme Court in 2013 invalidated several of its patent claims.

Too much of the discussion on variant data sharing has been dominated by controversy over these two genes, reflected Robert Nussbaum, a principal investigator for ClinVar, who has maintained an unpaid leadership role since becoming Invitae's chief medical officer last year. Sharing information in public databases is critical, in Nussbaum's view, when one considers that the genetic testing field has to concern itself with accurately classifying variants in 20,000 genes.

"If you just look 10 years out and imagine how much we will know then and how little we know now," Topper reflected, "it's almost a trivial thing to be bickering about."

Investing in classification

Genetic testing labs that want to address this challenge by submitting to ClinVar, not once but regularly, say it takes considerable time and resources to prepare their variant data for public scrutiny. Ambry, for example, submits data to the database every six months, and before each submission a team of scientists updates variant classifications based on the latest information and makes sure it is consistent with the reference sequence and has the right nomenclature.

"There isn't a button that you can click at this point that will automatically send that data over," said Tina Pesaran, who manages Ambry's variant assessment team. The firm is submitting all its variants from cancer genetic testing and exome sequencing, and is readying to deposit data from cardiovascular-related testing. 

But the work isn't done after submitting variants. ClinVar issues a monthly report that lays out the discrepancies between submitters, which the experts at these labs sift through to determine which variants need review. Labs often have internal data, Pesaran noted, which may not be published, but ClinVar's discrepancy reports encourage commercial entities that are otherwise competitors to discuss varying interpretations. "I reach out to various entities when I see a discrepancy and they'll do the same to us," she said. "So, we'll look at the data overall and see if we can come to a consensus. Ideally, that's where we want to be."

Consensus isn't possible for every variant reviewed in collaboration with another lab. For example, in the project that GeneDx and Ambry are conducting with the two academic labs, despite their efforts they still couldn't agree on 33 variants. "Sometimes we have differences in opinion based on how to interpret the literature. Some people may have more confidence in a functional study than another group," said GeneDx's Vincent, but the process of discussing classifications allows labs to see each other's reasoning.

Moreover, as the field moves to next-generation sequencing, which can assess multiple genes at once and uncover numerous rare variants, labs are bolstering their classification capabilities by building interdisciplinary teams and adopting new technologies in the lab. Ambry earlier this year launched a 65,000-square-foot "Super Lab" that will allow it to increase testing capacity by nine-fold and produce results faster. The lab will also perform functional assays to investigate the biological role of rare variants.

Invitae, meanwhile, is aggressively increasing its testing content and offering a tiered pricing scheme based on if its tests are out-of-network, in-network, or for patients paying out of pocket. According to Nussbaum, the company wants to be a one-stop provider of genetic testing. In its first-quarter earnings call, the company said it is testing for more than 1,000 genes implicated in a variety of diseases.

Scaling up this way also increases variant classification work, and Invitae employs more than 20 PhD scientists, 15 licensed genetic counselors, and a team dedicated to understanding the clinical significance of the data generated by its tests. Simultaneously, the company is pushing to reduce its cost of goods sold to $500 per sample by year-end by investing in software and bioinformatics capabilities, which make it easier to classify and report variants, as well as submit to ClinVar. There is a financial cost to sharing variant data, Topper acknowledged, but "this is just one of the costs of working responsibly in this space."

GeneDx, meanwhile, has 94 doctoral-level staff, of which 36 are board-certified geneticists, who work daily on classifying and reporting variants. The lab is implementing internal systems to flag variants that need reassessment quickly based on ClinVar's discrepancy reports. It has launched educational and outreach efforts to inform healthcare providers that GeneDx is putting variant classifications in the public domain and that classifications might change based on evolving evidence.

"We have seen a rise in individuals requesting a reclassification or a justification for classifications in ClinVar. So, that obviously also is a big resource pull for us," Vincent added. "But we encourage it. We want them to be proactive."

One reason genetic testing labs might balk at submitting to ClinVar is because added public scrutiny might reveal classification errors, which could increase liability. In an ongoing lawsuit against Quest Diagnostics and subsidiary Athena Diagnostics, Amy Williams has alleged that after testing her son for variants in the SCN1A gene, Athena misclassified the detected variant as having unclear links to a rare epileptic condition, when published literature at the time suggested it was pathogenic. Williams' son died in 2008 at the age of two after a seizure.

Some industry observers fear that depending on how this case shakes out, it may open the door for similar lawsuits against genetic testing companies alleging inaccurate variant classifications. Nussbaum argued, however, that the risk of liability is even more reason for labs to be open about their process. "The way one deals with liability is to be transparent," he said. "If there is an error that's been made, be up front about it. Make sure people understand what happened, and why it happened."

Athena has submitted data on nearly 300 variants to ClinVar (although not on the specific variant at issue in the lawsuit). Quest meanwhile is spearheading BRCA Share, a database of BRCA1/2 variants that's free for researchers, doctors, and patients to access, but for which commercial labs have to pay a fee. Currently, Laboratory Corporation of America is the only other commercial lab submitting data to BRCA Share.

Drawing on data from Quest and LabCorp, however, this database now includes 6,200 unique BRCA variants, and Quest's collaborators at France's National Institutes of Health have moved almost 400 variants with previously unclear links to cancer to more definite classifications. In light of recently launched large, multi-stakeholder research projects, such as the National Cancer Moonshot Initiative, "BRCA Share provides compelling proof that private and public entities can successfully collaborate to advance genetic science and healthcare services for patients," Quest spokesperson Wendy Bost said.

In a recent interview with GenomeWeb, however, Charles Strom, VP of genetics and genomics at Quest, was critical of the current errors in ClinVar and said it was challenging to upload data into the resource. Even so, he said Quest was working on submitting data to ClinVar and hoped to partake in ClinGen, another government funded project to create a central resource for clinically relevant genes and variants that can be used for precision medicine.

A challenge for the post-HGP world

While contributors to ClinVar are mostly clinical labs rather than pharmaceutical firms, AstraZeneca is looking to contribute variant data from studies of its ovarian cancer drug Lynparza (olaparib) to a public database like it or to global data-sharing efforts like the BRCA Challenge. "It's a bit complicated because it's not just germline data, but also tumor data,” Barrett said. “We've been talking with the National Cancer Institute as to the best place to house that information."

The drug developer used Myriad's BRACAnalysis CDx to test patients in the Lynparza studies for BRCA1 and BRCA2 gene mutations. The FDA in 2014 approved the therapy along with BRACAnalysis as a companion diagnostic that could identify which patients are likely to respond to treatment.

When collaborating with other companies, "we insist that we have access to the data to share," AstraZeneca's Barrett said. "We gain more than we lose by sharing that data openly."

This principle extends to AstraZeneca's massive sequencing project, within which the genomic data researchers generate will be precompetitive knowledge. In that effort, 500,000 samples will come from AstraZeneca's clinical trials, 500,000 from the public domain, and 1 million from the database of Human Longevity (HLI), a genomics-based biotechnology firm. Although HLI will perform the sequencing, Barrett said that AstraZeneca will further study the genotype-phenotype correlations in the context of its drugs and publish that information.

HLI, founded by genomics pioneer Craig Venter, didn't respond to requests for comment about its genetic data-sharing plans. The company aspires to build the largest catalog of whole genome, phenotype, and clinical data in the world, and Venter recently said that HLI has collected this information on 26,000 samples and is adding one genome every 15 minutes to its database.

Notably, one of Venter's other start-ups, Celera, was central to early debates about public access to human genome sequence data. Celera raced the publicly funded project to sequence the human genome, and sold subscriptions to drug firms and university researchers to access the genomic information in its database. The strategy didn't catch on and a few years after completion of the Human Genome Project in 2003, Celera abandoned its subscription strategy and deposited its sequence data into the publicly accessible GenBank.

Since then, it has become clear that the key challenge for the post-HGP world — to make sense of the information contained in our genes — is too big for one company, one database, or one project. It will require the participation of academic, government, and industry, draw on data in private and public repositories, and depend on the engagement of individual citizens, healthy and sick.

Recognizing this, President Barack Obama last year launched the Precision Medicine Initiative, aiming to advance personalized medicine research by leveraging existing capabilities in genomics, informatics, and information technology. In February, the White House held a summit where dozens of groups in academia and industry pledged commitments to the initiative.

At the summit, cancer genomic testing firm Foundation Medicine announced it was releasing data on more than 1,200 pediatric tumors across 51 cancer subtypes for research use. Although Foundation is sharing the data through its own website, this is the first time it is allowing external researchers access to information from its internal knowledgebase, called FoundationCORE, which contains comprehensive genomic profiles on approximately 80,000 patients.

The company said it will release more data from FoundationCORE to researchers over time, but the database is also a source of revenue. Foundation President Steven Kafka said the firm's genomic data-sharing efforts coexist with its collaborations with pharma companies using FoundationCORE to inform their drug development strategies, target discovery work, and efforts to identify patients for clinical trials. 

Genentech, one drugmaker using FoundationCORE (and whose parent company Roche recently took a majority stake in Foundation), has exceedingly detailed data-sharing policies. In general, the company posts clinical trial summaries on public websites and gives access to anonymized, patient-level information from its studies to qualified researchers who submit a proposal and sign a data-sharing agreement. Researchers interested in studies that generate genetic data must ask specific questions and Genentech will provide selected information in response, a spokesperson said. Genentech's policy on sharing genetic information currently does not include depositing variant data to public repositories like ClinVar, the spokesperson confirmed.

And yet genetic data is of growing importance at the company. Genentech last year said it would whole-genome sequence 3,000 Parkinson's patients who are customers of consumer genomics firm 23andMe in the hopes of identifying molecular markers that could inform its drug R&D. 23andMe's database of more than 1.2 million genotyped customers has been a draw for a number of drug developers, but spokesperson Andy Kill told GenomeWeb that these pharma collaborations have contractual stipulations that restrict public disclosures, and the research within its own therapeutics group is proprietary.

Moreover, when it comes to ClinVar, 23andMe believes it doesn't have anything new to contribute, since its genotyping service uses a custom SNP array that detects known pathogenic variants. When 23andMe's research identifies novel genotype-phenotype associations, the firm publishes on them in peer-reviewed journals, purchases public access rights to these papers, and discusses the findings on its blog. If 23andMe decides to commercially launch a next-generation sequencing panel in the future, Kill said the company will evaluate variant classification and publication at that time.

The policies and views about data sharing at companies like 23andMe and Foundation are of particular interest because they are not just selling genetic tests, they are building molecular information businesses fueled by troves of genomic and clinical data. These businesses may succeed where Celera didn't a decade ago, because researchers have since benefitted from publicly available data sets through projects like The Cancer Genome Atlas, and their work has helped make genomic information more meaningful in a healthcare context.

More and more pharma companies are starting to take note. In a Nature Genetics paper published last year, researchers from GSK estimated that selecting drug targets supported by genomic evidence could double the success rate of treatments under development.

"This is the best information we have actually, so why wouldn't you use this if you're going into a drug discovery campaign?" posited GSK's Cardon at the meeting in Washington, DC, a few months ago. He noted that the biological rationale for only 15 percent of new targets in clinical development is currently supported by genomic information. But the cost of bringing a drug to market has ballooned to $2.6 billion, and if drugmakers used genomics to validate all targets, he estimated it could reduce costs by 25 percent.

But as genomics is becoming more important to pharma, interpreting this information is getting more complex. For example, Foundation's pharma collaborations often involve "highly proprietary analytical algorithms" to analyze the information in its database. As a result, public variant databases should exist alongside privately held resources, Foundation's Kafka said.

"Doing this well, creating this data, is really difficult," he said. "It's a great example of why the Precision Medicine Initiative … [is] so focused on private/public partnerships as a way to bring different data sources together, because there is no one definitive source."

This is the second article in a series exploring efforts to improve the quality of variant interpretations in genomic testing. Read the first article here.