News Stay informed about the latest enterprise technology news and product updates.

Will the rise of self-service BI tools lead to the demise of the data scientist?

The demise of the data scientist; traditional data warehousing makes a comeback; and the self-quantified self: The Data Mill reports.

Is this the beginning of the end for the vaunted data scientist? That's the clever pitch from Tableau Software, one of a handful of business intelligence vendors pushing the envelope on self-service BI tools.

The Data Mill

Because the software's user interface offers drag-and-drop features, even users without a strong math background can build visualizations and interrogate the data, the vendor promises. Tableau isn't the only one with a strong self-service play. More and more vendors are offering analytical packaged applications that mask the complexity of analytics on the back end with easy-to-use features, which begs the question: Are data scientists just experiencing their 15 minutes of fame?

Dan Sommer, a Gartner analyst, doesn't think so. At the Gartner BI and Analytics Summit, he argued that while access to self-service tools may put analytics into the hands of just about every employee, it won't eliminate the role of the data scientist altogether. "You don't give a Ferrari to someone who just got their driver's license," Sommer said.

Or, perhaps more to the point, you don't give a bunch of raw materials to just anybody and say, "Build a Ferrari." That's not a job for the average mechanic; nor is sniffing out never-before-seen relationships from a company's diverse data sources a job for data dabblers.

There are just too many hard-to-detect data traps, and even data smarties like Carmen Reinhart, professor of international finance at Harvard Kennedy School, and Kenneth Rogoff, professor of public policy and economics at Harvard University, fall victim to them. They co-authored Growth in a Time of Debt, a study of the relationship between government debt and economic growth. The paper argues that as countries take on significant debt, their economic growth slows.

When Thomas Herndon, a University of Massachusetts Amherst graduate student in economics, tried to duplicate the findings, however, he "basically found the biggest spreadsheet error in the history of mankind," Sommer said. Turns out, some of the conclusions in the popular paper were based on incomplete data sets. Although the paper's basic finding didn't change with more complete information, Herndon found the conclusion wasn't nearly as black and white.

Businesses, it would seem, will not only need to keep their data scientists -- as O'Reilly Media's Mike Loukides has long argued -- they'll also need to encourage data skepticism.

Traditional data warehousing vendors back in style

One surprising takeaway from the Gartner Magic Quadrant on data warehousing and database management systems? Traditional data warehousing "came back with a vengeance in terms of demand," said Mark Beyer, Gartner analyst, at the BI Summit.

Most traditional vendors seen as "leaders" in this space, including IBM, Teradata and SAP, with its HTAP or hybrid transaction/analytical processing, are building logical data warehouse roadmaps. A term coined by Beyer, the logical data warehouse is a relatively new approach to data management that veers away from the central repository. Instead, data lives where it best resides -- be it in a traditional data warehouse, analytical database, or Hadoop-distributed file system -- and virtual layers provide views into the data.

The traditional vendors are "getting into a title fight and coming after each other," Beyer said. But they also need to watch their backs: Cloud provider Amazon Redshift, Hadoop distributor Cloudera and NoSQL database provider MarkLogic found their way into the quadrant this year. They didn't debut as "leaders," but they didn't give a weak performance, either.

One man embraces the quantified self to determine his (data) worth

Welcome to the self-quantified self. Federico Zannier of Brooklyn, N.Y., data-mined himself and then launched a Kickstarter campaign to hawk his personal data in a project he called "a bit(e) of me."

Previously on The Data Mill

For a smart analytics strategy, think Goldilocks

Five tips for a cloud-first strategy

Ford's connected car revs up with APIs and external app developers

"I violated my own privacy," he said in his campaign video. Why? U.S. advertisers, known to buy and sell customer data, raked in $30 billion in revenue in 2012, Zannier explained. "In 2012, I personally made zero dollars. … Is my personal data worthless to me?"

To find out, Zannier put a price on his data. For a mere $2, anyone could buy a day's worth of Zannier's self-quantified self -- bundled into a single folder. The data included websites he visited that day, photos of his face looking at his computer taken every 30 seconds, screenshots of the pages he was looking at, his GPS location, the positions of his mouse and a list of applications he used.

He attracted 213 backers and raised $2,733. Not exactly a goldmine, but more than five times his goal of $500. And he promised to use the funds "to finish a browser extension and an iPhone app that allows you to do the same."

Welcome to The Data Mill, a weekly column devoted to all things data. Heard something newsy (or gossipy)? Email me or find me on Twitter at @TT_Nicole.

Dig Deeper on Enterprise business intelligence software and big data

Join the conversation


Send me notifications when other members comment.

Please create a username to comment.

Have BI tools evolved to a point where a data scientist is not required to interpret the results?
I am a volunteer judge for the Science Fair. The kids have some pretty interesting projects and access to equipment we never did when I was a kid. But they have no idea what to do with the data. They print pretty colored bar charts and take an average or two, but have no idea what a Chi-square or Pearson's r value mean. They could calculate these basic stats with the same spreadsheet that did that pretty picture! They could download an open source stat routine, too. But without the knowledge, the tool is a rock or paper weight.
Validation and verification of results will be needed. Just because the self-service tool provides results the verification of those results is needed. That takes humans with knowledge of the subject area and a good understanding of the data environments.
The data scientist is intuitively conditioned to discern nuggets and patterns of data only data veterans can see.
too many business users have a very narrow "siloed" view
Companies like Procter and Gamble have been doing data science for decades - it was called statistics. The faddish term may go out of fashion but the need for big brains to recognize patterns and test assumptions will always be fundamental to business. The next challenge will be to better operationalize the results of data science and apply machine learning to improve work streams and decisions in real-time.

Question for you: what is required to better operationalize the data science results-- faster connections, better analytics? Also, do you agree that the "big brain" statisticians also need to bring deep domain expertise. I've met a few brilliant mathematicians who don't know enough about the data to pick up on errors.
I think “operationalization” (mouthful) starts with algorithms that determine the next best action. The more data you have, the simpler and more accurate that model can be, so yes, all of the above – more data, models that learn, and faster connections to score transactions in real time. Also would agree domain expertise, or in lieu of that close alignment with the business, is a prerequisite. My org's team of data scientists specializes by discipline. I don’t know if “big brains” also need to know how to structure, cleanse, and move data. I have read that today 80% of a data scientists time is spent on these kinds of activities vs. higher value analysis. That seems like an area where technology can lend improvement.
Thank you! 

Would be great to connect to hear more details about your team, if you're inclined to talk to the press: Or feel free to contact our excellent big data reporter and the author of the story that prompted the question, 
Absolutely! We do have opinions ;-)
Will ping you under separate cover. Thanks and Happy weekend!
Self-service BI != data expert no matter how we slice it. We don’t know what we don’t know, and that’s the danger Nicole.

Anyone who actually believes that a self-service tool, or a few semesters of college, or a few months of research on the Internet makes them even remotely qualified in anything (especially BI) is, well....
Not sure if you received my first comment so I will repeat it. The extinction of a Data Scientists is not feasible, why it takes time to become one and develop the keen aptitude to see data nuggets, patterns and valued BI recognitions the brain is conditioned to see, where computers and/or artificial intelligence cannot see without human intervention. Kind of like doctors, with advanced equipment they can excel, but the equipment still relies on the doctor to discern the results.