Skip to main content
Premium Trial:

Request an Annual Quote

YouGene Aims to Incentivize Wider Sharing of Proprietary Genome Data


NEW YORK (GenomeWeb) – In order to provide access to high-value proprietary datasets owned by companies, academic institutions, and other entities, YouGene has come up with infrastructure and a business model that provides a forum for these data holders to share their information more broadly and profit from its use.

Specifically, the company has developed YouGene Connect, a platform to which entities can submit their most valuable proprietary datasets and then charge royalties or subscription fees for third parties to access and use the information.

"We enable collaboration between corporate entities to improve their business results," YouGene CEO Roger Hahn told GenomeWeb. "Our framework allows different labs to independently work on parts of the database which contains the best and most up-to-date data but at the same time provid[es] an incentive for them to operate and contribute [to] that framework."

More importantly, it recognizes that businesses' make significant investments in obtaining, cleaning, storing, protecting, and insuring their datasets, according to Hahn. He is a registered patent attorney who spent the last 17 years helping businesses protect their intellectual property. According to at least one industry estimate that he pointed to, businesses are expected to spend as much as $187 billion on data analytics alone by 2019. They are willing to spend that much because they believe that there are competitive advantages to be obtained from mining that data or because they believe that better understanding the data will allow them create some kind of value.

In other words, businesses have long recognized that "data is money and it has to be treated as such," Hahn said. Within the genomics community, however, most data-sharing initiatives are largely philanthropic in nature or tied to government funding policies. Last year, the American Association for Cancer Research launched the Genomics, Evidence, Neoplasia Information Exchange which seeks to create a registry of somatic sequencing results and limited clinical information linked to these results. Earlier this year, Ambry Genetics shared aggregate allele-frequency data gleaned from the exome data from more than 10,000 cancer patients from AmbryShare, its internal database of deidentified information from individuals who had or have hereditary breast and ovarian cancer and received testing from the company.

However, after having spent significant resources to generate their internal datasets — Ambry spent around $20 million to launch AmbryShare, for example — there is no real sustainable incentive for data holders to contribute their highest-value datasets to a common pool and to continue to do so in future.

"Companies do want to be good corporate citizens ... but at the end of the day they are profit-seeking or profit-maximizing entities," Hahn said. These are laboratories that need to see a return on their investment and companies that have to answer to their shareholders. As a result, whatever acts of corporate responsibility they engage in must align with their core purposes, values, and interests.

YouGene recently submitted comments related to the need for incentives in a document that detailed its support of recent draft guidance from the US Food and Drug Administration's for using public human genetic variant databases to support clinical validity in NGS-based in vitro diagnostics. In its comments, the company asked that a separate confidential pathway be made for evaluating the safety and efficacy of private databases so that they can be used as sources of "valid scientific evidence" for the development of NGS-based tests.

"If we are talking about something that goes beyond just optics and getting good press, something that ... will allow us to really get to the goal of precision medicine, we have to be honest in the approaches that we are taking and really confront the core issues that an accountant will look at, or a shareholder, or an investor will look at," Hahn said. "There needs to be an incentive that drives the behavior that we're looking for which is contribution and sharing and improving the overall knowledge in genetic databases." Otherwise, a lot of potentially valuable information will remain locked up behind corporate and institutional firewalls resulting in "a fragmented, suboptimal picture of human genetics."

YouGene attempts to solve this problem by providing a platform that incorporates "the best design features of the US patent system such as publication, priority, and examination," Hahn said. It provides a secure platform for organizations to share their high-value datasets at no cost along with a mechanism that ensures that they reap benefits from third-party use of their data. "It meets the dual goal of having [a] unified framework where you have the best and most up-to-date data and then also most importantly solves the incentive problem," he said.

Specifically, as their data is incorporated into a pipeline and used commercially, the submitter receives royalties or subscription income as compensation for submitting their data. YouGene has a proprietary algorithm that operates on a percent versus absolute dollar basis and that it uses to establish royalty rates for data submitters, Hahn told GenomeWeb. Essentially, "submitters earn revenue for the data submitted from royalty rates depending on how often the data is used and how the data is incorporated with data submitted by other contributors," he explained. The company also has put in place a proprietary, micro-attribution framework to protect data submissions and allow submitters to retain control over their data.

The company accepts various types of genetic datasets. "There is no minimum quality of data that can be submitted but there is a minimum quality score that will be accepted and allowed on our subscription network for commercial clinical use," Hahn said. The so-called YouGene curation score depends on factors such as the strength of the assertion, type of evidence, and supporting data. Submissions that have a higher score will earn higher royalties than the same data used in the same way with a lower score. This should help encourage submitters to contribute their best datasets as there are significant rewards for contributing higher quality data, Hahn said. Users do not need to subscribe to the database in order to submit data but they do need to purchase subscriptions for themselves in order to view and use data from other submitters. Meanwhile, non-commercial research use of the Connect data is free.

Some types of data submissions have perpetual licenses; however, the royalty rates will decline over time as new evidence comes to light and is added to the data. As a result, it makes sense for data contributors to submit datasets quickly in order to maximize their royalties for as long as possible, Hahn said. Other data submissions have more formal IP protection that are similar to patents that have licenses that expire fifteen years after the first commercial use. 

For their part, customers who want access to information are charged an annual subscription fee — they do not have to contribute data in order to use data. Furthermore, clients that use the information to develop new genetic tests are expected to share test revenues with the data contributors. For example, earlier this year, YouGene signed an agreement with Columbia Technology Ventures (CTV), the technology transfer office of Columbia University, that allowed it to bring a portfolio of non-invasive gene-based tests and underlying data for kidney disease into YouGene Connect. In the case of CTV, as database users build genetic tests on CTV's underlying data, CTV will be entitled to royalties from the test, Hahn explained.

Hahn expects that clinical labs, clinicians, pharma, universities, and hospitals will most likely purchase subscriptions to YouGene Connect — "basically, anyone who is interested in accessing a compendium of highly curated genetic findings and data for either clinical or research use," he said. Clinicians, for example, can use the information in the database to supplement and cross-check the information contained in their in-house databases while insurers have access to validated biomarkers that they can use for reimbursement purposes. Meanwhile, lab developers can use the Connect data to develop more effective tests while researchers might look to the database to keep abreast of current research and avoid replication. Finally, pharmaceutical companies might be interested in using the data as part of efforts to develop companion diagnostics, he said.

In addition to YouGene Connect, the company's product portfolio also includes YouGene Curate, through which it offers supporting variant classification services to clinical labs that need help classifying variants for next-generation sequencing-based tests. It provides a way for labs to check the validity of the variant classifications that their scientists make, Hahn explained.

YouGene uses a proprietary semi-automated pipeline for its curation that uses American College of Medical Genetics and Genomics guidelines to classify variants based on criteria such as co-occurrence, segregation analysis, population frequency, amino acid conservation, and SIFT and Polyphen scores, among other information. The company deposits these variants in a proprietary database accessible to interested customers by a subscription fee or royalty. The exact pricing depends on factors such as the type of test, how many variants the customer wants to look at, and where in the test process customers access the database. The database is interoperable with laboratory information management systems and electronic medical records and reports can be delivered quickly with automated systems.

The company also offers YouGene Bank, a direct-to-consumer resource that provides storage resources for individuals who have had their genomes or exomes sequenced. This platform, Hahn explained, is designed to give consumers control over their data and let them decide who can access it, and to free patients' data from being tied to a single health system. "It is consumer-initiated, physician-mediated," he noted, so patients request a script from their physicians to have their genomes sequenced and the data loaded into YouGene Bank. The company will only offer the service to those who have written orders from their physicians. "We are working with FDA and doing what we need to do to make sure that it meets CAP/CLIA requirements and so forth," he said.