Big data will no doubt get red-carpet treatment at O'Reilly Strata Conference: Making Data Work in Santa Clara, Calif. this week. But "small data" will also have its moment in the spotlight when Felienne Hermans, assistant professor at the Delft, Netherlands-based Delft University of Technology, takes the stage to talk about the unruly relationship between an enterprise and its spreadsheets.
"Most companies out there, they are still thinking about mini data and the small problems they have," Hermans said during a conference webinar preview. "And most of those problems happen in spreadsheets."
The continued allure of spreadsheets is that they're easy to use and easy to share. According to research findings, 95% of U.S. businesses use them for financial reporting. But those same features also make spreadsheets difficult to control. As CIOs are well aware, the integrity of the data plunked into a spreadsheet is almost impossible to maintain: Documentation is often lacking, and errors are easily introduced and then passed on. Since 50% of spreadsheets are used to make decisions, Hermans said, businesses put themselves at risk in all kinds of ways.
The spreadsheet horror stories collected by the European Spreadsheet Risks Interest Group offer ample proof. This one's ugly: The London Olympics overbooked a stadium event by 10,000 tickets due to a spreadsheet error. Want uglier? A copy-and-paste error once cost the Alberta, Canada-based energy company TransAlta Corp. $24 million.
To Hermans, the spreadsheet struggle is not unlike what software developers dealt with before oversight tools were added into the programming framework. Today, developers can debug, test and even analyze their code as they work on it. Hermans wanted to extend that same kind of support to Excel users and developed tools and techniques to do just that. Visualization tools can turn a spreadsheet into a flow chart, for example, shows where data originates and, based on the thickness of an arrow between two sources, how much a spreadsheet relies on a particular data source.
"Seeing" if a spreadsheet has become overly complex or is heavily using one source of data can indicate points of weakness, essentially helping a user to assess the health of the spreadsheet Hermans said.
Data scientist Monica Rogati, another Strata Conference presenter, has a bedtime story for you. Vermonters are the best-rested in the U.S., racking up an average seven hours and two minutes of sleep. The least-rested state? Hawaii, where residents get by on six hours and 20 minutes of sleep per night.
Rogati is vice president of data at San Francisco-based Jawbone, the audio and wearable tech company, which is making a name for itself with its activity-tracking UP wristband. For the past six months, Rogati has been digging into the data generated by the fitness tracker. Her message? The greater the data, the richer the story.
The Data Mill
Dropbox CEO on recruiting top talent
Data products, ethical dilemmas and the data scientist
Hadoop 2.0 impacts big data technologies
Of course, these stories aren't based on a perfect sample of United States citizens: It's only a measurement of those wearing UP wristbands. While that might give researchers pause, it shouldn't discredit the information altogether, according to Rogati. When she sifted through data to find out who got more sleep -- men or women -- her conclusions matched those from more traditional research studies.
"Turns out women get 20 more minutes of sleep on average," she said. Researchers have known this for a while based on studies they conducted with a few hundred participants -- probably not a perfect sample either.
And there actually isn't enough traditional research data to take the story much further, Rogati said. That's not the case with Jawbone data. "When you have hundreds of thousands of people contributing sleep data, you can see patterns you weren't able to see before," she said. The volume of data can help data scientists and researchers explore and uncover how, for example, location, age or a person's body mass index might play a role in sleep patterns.
The power of the 1%
You've probably seen the General Electric commercials on the company's "brilliant" machines that combine hardware with analytical software to crunch tons of real-time data. Its data-prolific GEnx jet engine, for example, produces 5,000 data points analyzed per second; that amounts to a half a terabyte of data per flight. And, while those numbers are ginormous, General Electric is also banking on the power of a very tiny number -- the number one, said Steven Gustafson, R&D manager at the Knowledge Discovery Lab for Schenectady, N.Y.-based GE Global Research.
"Often, people [in industry] think the era of doing productivity gains is over," Gustafson, a Strata Conference presenter, said during the webinar. A company applies Lean Six Sigma process methodologies and believes it's wrung the productivity gains dry. "But I don't think that's entirely true. We can really change things by going from a "break it, fix it" model to doing a much better job of predicting and preventing outages."
The big challenge for businesses is figuring out how to use data, software and analytics to turn historical knowledge and information on day-to-day operations into different kinds of maintenance strategies. And those strategies will have to include ways to leverage sensor data and connected devices, he said.
But the payoff of even 1% will be worth it -- as in 1% savings in fuel efficiency of power gas turbines over 15 years on fleet could pave the way for a whopping $66 billion in savings, according to Gustafson.