NEW YORK (GenomeWeb) – Members of an international team led by investigators in Switzerland, Germany, and the US are calling for increased access to genomics data, including data that has ostensibly been made public but remains difficult or slow to access.
"It's important to not delay the user analysis that could lead to meaningful and impactful work and, at the same time, to provide credit to people for whatever they do: generating data, using data, analyzing, interpreting, or whatever they have done," co-author John Ioannidis, disease prevention chair at Stanford University, said in an interview.
For a policy forum paper published online today in Science, Ioannidis and his colleagues explored some of the existing impediments to genomic data sharing, both before and after publication. But they also outlined strategies for incentivizing the type of data openness that they believe will spur more extensive and meaningful analyses.
"Despite some notable progress in data sharing policies and practices, restrictions are still often placed on the open and unconditional use of various genomic data after they have received official approval for release to the public domain or to public databases," the authors wrote.
Data sharing is hindered by a range of factors — from concerns about validating unpublished data before releasing it to competition for publications, they explained.
Rather than relying on first publication alone as a means of gaining value from data generation, the team suggested that strategies should be in place to ensure that data producers get ample credit for their work and the analyses the data gets used for, either directly or indirectly.
"The abolition of the first publication privilege may seem unfair in individual cases," first author Rudolf Amann, a researcher at the Max Planck Institute for Marine Microbiology, said in a statement. "However, it is essential for the further development of the life sciences that freely available sequence data can be used immediately by all scientists for their analyses and publications."
For his part, Ioannidis suggested that the propensity for genomic data sharing tends to exist on a continuum, with some fields resisting openness and others — including in newer realms such as microbiome research — often embracing it. And, he noted, the technological hurdles to transferring or storing such data will inevitably be overcome in fields that embrace data sharing, allowing more investigators participate.
"Technological challenges do exist, but they can be overcome. And the easiest way to overcome them is when there is consensus in the field that [data sharing] is something that is worth doing," Ioannidis said, noting that trying to deal with such technical challenges on a case-by-case basis "is probably going to require far more effort compared to a situation where the field has agreed [on] a standard practice of sharing everything."
Genomics investigators attempted to formalize such openness in the Fort Lauderdale Agreement in 2013, which called for "free and unrestricted" genome sequence data use. Even so, authors of the new policy piece suggested that the other stipulations within the agreement have encouraged investigators to hold data until, or even after, it is published, inadvertently deterring the release of data in a format that is publicly available.
"There have been efforts in the past … trying to optimize how these data can be widely shared and used. But there's still a lot of tension about when exactly, and how exactly, these data would be usable," Ioannidis explained. "The major tension has been between data generators and data users, who are not necessarily the same people. Increasingly we see a situation where data generators may be sitting on the data in the hope that they can get publications out of them."
The advent of large-scale sequencing studies, integrated datasets, and advanced analytical tools have added still more dimensions to the data sharing issue, the authors noted, meaning "outsiders" might "have better analytical capabilities and/or overarching protocols for analyzing more comprehensive sets of data, pre- and post-publication."
Indeed, the paper argued that making data available without restrictions or special permissions increases its value, not only for teams analyzing it, but also for data producers and funding agencies, by allowing the data to reach their full potential.
"The intention of the funding agencies who require pre-publication data sharing has always been to encourage the use of such data by the entire community and to encourage open competition to accelerate discovery and maximize the benefit for members of society who are paying for data generation," the authors wrote.
Despite some of the hurdles addressed in the paper, Ioannidis noted that data sharing and openness are on the rise in genomics and beyond. In a study appearing in PLOS Biology in November, for example, he and co-authors from the Yale School of Public Health and SciTech Strategies in New Mexico found that more than 18 percent of 104 randomly selected biomedical papers published between 2015 and 2017 involved data sharing — up dramatically compared to a similar analysis published from 2000 to 2014.
"There's clearly progress, and I think genomics is leading the way," he said, noting that "many, many other disciplines are realizing this is a good idea."
"Every field where we do see a transformation towards a more open culture is a gain," Ioannidis continued, demonstrating that "that this can be done, and it can be done effectively, in a way that facilitates progress in science."