NEW YORK (GenomeWeb) – Seeking to stay on top of the rapid accumulation of biomedical data and the need for bioinformaticians to become more efficient, the National Institutes of Health today issued a five-year strategic plan for data science. The agency also said it would hire its first chief data strategist.
"[The] NIH must weave its existing data science efforts into the larger data ecosystem and fully intends to take advantage of current and emerging data management and technological expertise, computational platforms, and tools available from the commercial sector through a variety of innovative public-private partnerships," the agency said.
Genomics research is a major driver of this decision, the NIH noted, adding, "By 2025, the total amount of genomics data alone is expected to equal or exceed totals from the three other major producers of large amounts of data: astronomy, YouTube, and Twitter."
The agency also said that the growth of biomedical databases has created massive inefficiencies, citing a 2016 survey from CrowdFlower that found that data scientists spent 80 percent of their work time collecting and organizing data rather than analyzing it.
"The generation of most biomedical data is highly distributed and is accomplished mainly by individual scientists or relatively small groups of researchers," the NIH plan stated. "Moreover, data also exist in a wide variety of formats, which complicates the ability of researchers to find and use biomedical research data generated by others and creates the need for extensive data 'cleaning.'"
In a bid to address these challenges, the new plan has five broad goals: to support an efficient biomedical research data infrastructure; to promote the modernization of the data ecosystem; to promote advanced data management, analytics, and visualization technologies; to support workforce development; and to create policies that lead to sustainability of the other steps.
While some preliminary work is already underway, the NIH expects to begin implementation of the strategic plan over the next 12 months. The agency said it would "continue to seek community input" throughout the implementation and indicated that the plan likely will require "frequent course corrections" as technology and data needs evolve.