Will the rise of self-service BI tools lead to the demise of the data scientist?

The demise of the data scientist; traditional data warehousing makes a comeback; and the self-quantified self: The Data Mill reports.

Is this the beginning of the end for the vaunted data scientist? That's the clever pitch from Tableau Software, one of a handful of business intelligence vendors pushing the envelope on self-service BI tools.

The Data Mill

Because the software's user interface offers drag-and-drop features, even users without a strong math background can build visualizations and interrogate the data, the vendor promises. Tableau isn't the only one with a strong self-service play. More and more vendors are offering analytical packaged applications that mask the complexity of analytics on the back end with easy-to-use features, which begs the question: Are data scientists just experiencing their 15 minutes of fame?

Dan Sommer, a Gartner analyst, doesn't think so. At the Gartner BI and Analytics Summit, he argued that while access to self-service tools may put analytics into the hands of just about every employee, it won't eliminate the role of the data scientist altogether. "You don't give a Ferrari to someone who just got their driver's license," Sommer said.

Or, perhaps more to the point, you don't give a bunch of raw materials to just anybody and say, "Build a Ferrari." That's not a job for the average mechanic; nor is sniffing out never-before-seen relationships from a company's diverse data sources a job for data dabblers.

There are just too many hard-to-detect data traps, and even data smarties like Carmen Reinhart, professor of international finance at Harvard Kennedy School, and Kenneth Rogoff, professor of public policy and economics at Harvard University, fall victim to them. They co-authored Growth in a Time of Debt, a study of the relationship between government debt and economic growth. The paper argues that as countries take on significant debt, their economic growth slows.

When Thomas Herndon, a University of Massachusetts Amherst graduate student in economics, tried to duplicate the findings, however, he "basically found the biggest spreadsheet error in the history of mankind," Sommer said. Turns out, some of the conclusions in the popular paper were based on incomplete data sets. Although the paper's basic finding didn't change with more complete information, Herndon found the conclusion wasn't nearly as black and white.

Businesses, it would seem, will not only need to keep their data scientists -- as O'Reilly Media's Mike Loukides has long argued -- they'll also need to encourage data skepticism.

Traditional data warehousing vendors back in style

One surprising takeaway from the Gartner Magic Quadrant on data warehousing and database management systems? Traditional data warehousing "came back with a vengeance in terms of demand," said Mark Beyer, Gartner analyst, at the BI Summit.

Most traditional vendors seen as "leaders" in this space, including IBM, Teradata and SAP, with its HTAP or hybrid transaction/analytical processing, are building logical data warehouse roadmaps. A term coined by Beyer, the logical data warehouse is a relatively new approach to data management that veers away from the central repository. Instead, data lives where it best resides -- be it in a traditional data warehouse, analytical database, or Hadoop-distributed file system -- and virtual layers provide views into the data.

The traditional vendors are "getting into a title fight and coming after each other," Beyer said. But they also need to watch their backs: Cloud provider Amazon Redshift, Hadoop distributor Cloudera and NoSQL database provider MarkLogic found their way into the quadrant this year. They didn't debut as "leaders," but they didn't give a weak performance, either.

One man embraces the quantified self to determine his (data) worth

Welcome to the self-quantified self. Federico Zannier of Brooklyn, N.Y., data-mined himself and then launched a Kickstarter campaign to hawk his personal data in a project he called "a bit(e) of me."

Previously on The Data Mill

For a smart analytics strategy, think Goldilocks

Five tips for a cloud-first strategy

Ford's connected car revs up with APIs and external app developers

"I violated my own privacy," he said in his campaign video. Why? U.S. advertisers, known to buy and sell customer data, raked in $30 billion in revenue in 2012, Zannier explained. "In 2012, I personally made zero dollars. … Is my personal data worthless to me?"

To find out, Zannier put a price on his data. For a mere $2, anyone could buy a day's worth of Zannier's self-quantified self -- bundled into a single folder. The data included websites he visited that day, photos of his face looking at his computer taken every 30 seconds, screenshots of the pages he was looking at, his GPS location, the positions of his mouse and a list of applications he used.

He attracted 213 backers and raised $2,733. Not exactly a goldmine, but more than five times his goal of $500. And he promised to use the funds "to finish a browser extension and an iPhone app that allows you to do the same."

Welcome to The Data Mill, a weekly column devoted to all things data. Heard something newsy (or gossipy)? Email me or find me on Twitter at @TT_Nicole.

This was first published in April 2014

Dig deeper on Enterprise business intelligence software

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

Related Discussions

Nicole Laskowski, Senior News Writer asks:

Have BI tools evolved to a point where a data scientist is not required to interpret the results?

10  Responses So Far

Join the Discussion

2 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

SearchCompliance

SearchHealthIT

SearchCloudComputing

SearchMobileComputing

SearchDataCenter

Close