|Matthew Dublin is a senior writer at Genome Technology.|
Here's one for the contrarian folder. According to Wharton School of Business professor Peter Fader, the increasing emphasis on "big data" is turning into a data hoarding fetish that will ultimately result in a wild goose chase wherein researchers won't learn nearly as much as they hope to from all of their data.
In a Q&A published in Technology Review, Fader compares those who have put all of their faith into big data to technical stock analysts — these are the folks who attempt to predict further stock prices based on past prices. The only issue with this approach, he says, is that, typically, it just doesn't work out that well. Their mathematical models do not take into account the myriad reasons why a stock's price may have changed over time.
What worries Fader is that data scientists are currently doing the exact same thing: loading lots of and lots of data into a formula in the hope that some pattern fits.
While Fader maintains he is no big data luddite, he even finds fault with the potential of machine learning and new database platforms engineered to take advantage of huge datasets such as Hadoop, a database framework with a bright future in the bioinformatics community.
"I make sure my PhD students learn all these emerging technologies, because they are all very important for certain kinds of tasks. Machine learning is very good at classification—putting things in buckets. … The problem is that there are many decisions that aren't as easily 'bucketized'; for instance, questions about 'when' as opposed to 'which.' Machine learning can break down pretty dramatically in those tasks," Fader says. "It's important to have a much broader skill set than just machine learning and database management, but many 'big data' people don't know what they don't know."