NEW YORK (GenomeWeb) – Members of the Sustainability Working Group of the National Institutes of Health Big Data to Knowledge (BD2K) initiative are still accepting responses to a request for information that asked the community to suggest a series of metrics that could be used to measure the value and impact of biomedical data repositories.
According to the RFI, which will accept responses from the community through Oct. 17, these metrics will be used "to assess the value and impact of biomedical digital data repositories that may prove a basis for technical and science policy actions required to support the long-term sustainability of repositories." This includes qualitative and quantitative metrics that describe, for example, frequency of access and number of downloads as well as publications from data, data citations, and data utilization in research studies.
Such metrics will provide a mechanism for the NIH to on the one hand "objectively and consistently" measure the value of the data repositories and the datasets within those repositories, but also to quantify the value of individual datasets and data items within a database, Juli Klemm, head of the National Cancer Institute's cancer biology and genomics section and a representative of the working group, told GenomeWeb. She said that the RFI grew out of discussions at the NIH and within the broader biomedical community about how to deal with the continuing growth of biomedical data and at the same sustain public access to these datasets long term.
With fewer research dollars available, in recent years a number of major resources have sought alternative sources of funding in order to stay afloat. For example, developers of the Arabidopsis Information Resource launched Phoenix Bioinformatics, an independent non-profit organization to explore alternative funding mechanisms for the resource after its funding dried up. More recently, the developers of the Online Mendelian Inheritance in Man repository began asking for contributions from the community to generate the revenue they need to keep the resource going long term.
"The volume of data is growing exponentially but our budgets are relatively flat, so [there is] a clear challenge with the need to develop policies and funding models to appropriately maintain access to the data that these researchers need," Klemm said. She pointed to a perspective piece published last November in Nature by researchers from the NIH that summarizes the challenges of current and projected costs of managing biomedical data. In that article, the authors note that funders have historically only been interested in how the data resources that they support are used and by whom but not necessarily the details of which individual items and types of data are used and why.
Some preliminary studies show that typically researchers use only small subsets of the data frequently. However, the exact subset of data that researchers use change over time, and researchers may not access the portions of data that they need until after they download datasets from the repository. These factors make it difficult to quantify data usage patterns, the authors wrote.
Such information is crucial to understanding data usage patterns and figuring out how best to target annotation and curation efforts as well as figure out which repositories should receive the most attention and financial support, they wrote. Currently, the 50 largest NIH-funded data resources have a collective annual budget of $110 million, which represents "the tip of iceberg for future needs," according to the authors. "When we have a better understanding of data usage, we can develop business models that consider supply and demand, and develop sustainable practices." To that end, they call on funders to develop new metrics for assessing the usage and value of data and to encourage the resources that they fund to adopt them.
"Our hope is to hear from a wide variety of stakeholders who are in different domains of biomedical science ... and we hope through their responses to understand those communities' recommendations for how repositories in their domain should be assessed," Klemm said. "Probably an important part of the RFI is informing that strategy, whether that could be a consistent set of metrics or whether they need to be customized given the scientific domain. We hope the RFI will help us understand how to approach that."
The group is particularly interested in hearing from database developers, providers, and curators. "We certainly expect and want to hear from them because ultimately we would be working with them to apply and report on these metrics and to incorporate the measurement of these items into their maintenance operations."
The working group is also keeping an open mind regarding how the metrics adopted might be applied, so, for example, whether the community would adopt a standard set of metrics that would be applied across all databases or whether different biomedical domains would have their own unique set of metrics, Klemm said.