The life sciences field needs better, more flexible, upgradeable, and scientist-friendly big data management tools. John Boyle, soon to be senior director of bioinformatics at Kymab, writes in Nature.
Boyle says that many data management projects focused on the helping life sciences investigators manage and analyze large data sets have failed because these systems are hard to create, harder to use, and have used a one-size-fits-all approach.
These systems should be able to flex to the scientists’ needs, rather than the scientists bending their projects to fit the data management format, he says.
Boyle sees NCI’s caBIG data integration project as a perfect example of a big data effort that turned into a costly flop.
“[CaBIG] had had admirable goals and seemed workable in theory, but in the end it was too complicated to use. Crucially, caBIG relied on standardized data formats, which called for standardized experiments. Its one-size-fits-all approach fit nearly nobody,” he explains.
An ideal system would handle the storage, provide “common and secure access methods,” allow for linking, annotation, and querying to retrieve information. It also would be able to work with data from different locations, such as remote servers, on laptops, in databases, on different machines, and in a range of formats, spreadsheets, and file types.
Currently, that that system does not exist, and most academic organizations have been developing their own models and systems, which makes it hard to connect them for collaborations.
“The situation is as unworkable as if every lab in the country had decided to devise its own (poor) document-editing software,” Boyle says.
The solution, as he sees it, is that life sciences data management systems probably should be developed by the life sciences community.
One place to start, he says, is by taking three lessons from the failures of the past.
Data are dynamic, they are going to change, and come in range of formats, so a useful system has to be flexible and updatable.
New systems also should offer benefits to investigators but also be painless for them, meaning easy to use, he urges. Boyle also says such systems should be developed with a focus on the need to find workable solutions to the problems, “and not by a desire to make the problem fit the latest fashionable technology.”