The movie Moneyball turned statisticians into superheroes, but the baseball industry, for the most part, still operates on crude analytics. That’s according to Hyoun Park, president and chief research officer for the Boston-based Blue Hill Research Inc.
“The biggest problem fundamentally in baseball is that you get a lot of consumption of analytics,” said Park, a featured speaker at Predictive Analytics World in Boston last week. “And what you get is data crunching without context.”
Take scouting reports, one of three key types of data used in baseball. That’s where experts try to predict how a 16-year-old will perform in 15 years. They use simple metrics to size up the skills of potential players (and potential multimillion dollar investments) — sometimes even ranking their ability on a scale of 1 to 10, Park said.
“Not only is that not granular enough to understand what is really happening, you don’t have enough information to create a good predictive model,” he said.
Another baseball data source is the scorecard, a log- or event-based analysis of the game itself. Baseball, Park said, can be boiled down to two real actions: Scoring runs or making outs. For the team up to bat, the goal is to score as many runs as possible before getting three outs. Analyzing scorecard data can provide insight into what’s going to occur based on what’s already happened – and, in some cases, that means casting new light on old thinking.
“One of the most popular strategies in baseball is the sacrifice: the idea that if you have someone on first base, you can move the ball along, get this person to second at the cost of an out,” Park said.
But the data crunching shows something different. It may seem as though a base runner’s chance of a making it home increases as he goes from first to second, but sacrificing an out to get him there actually reduces his chances of scoring a run. “It’s a very interesting [example],” Park said. “We don’t often think of predictive analytics by looking at the positive and negative correlations together at the same time.”
The box score
Another metric relic is the box score, a dashboard that hasn’t changed for years. “It represents a very consistent data point for high-level baseball activity,” Park said.
The tough thing about data crunching the box score? It provides a good understanding of what’s happening in a particular game, but the view is too narrow for prediction. It doesn’t, for example, consider how external factors, such as weather, could impact a team or how variability impacts the data.
“You’ll find that a lot of teams try to make decisions based on predictive modeling for a single game,” Park said, “That doesn’t really work well because … a small sample size leads to a high potential of variability.”
Why it matters to your business
It’s these kinds of lessons that can be applied to businesses, said Park, whose aim is to parlay these insights into better business analytics. The metrics many companies use to evaluate employees are on par with (or inferior to) baseball scouting reports. But the less-than-meaningful business metrics are not limited to employee evaluations — simplistic ranking systems can be found in other areas of the business, Park said. Simplistic ranking systems are used to measure customer satisfaction and even product development. To really add value to the business, he advises (re)building predictive models that use rich, granular data.
Sales departments provide another example. The prevailing mindset there is to push forward, sometimes without studying the data and looking at how current events might impact what happens in the future. That means sales teams can fall into the trap of seeing any progress as good progress while ignoring other risk factors (*ahem* sacrifice bunt) that may mean losing the sale altogether.
Finally, small sample sizes can lead to analyzing data in a vacuum. Without considering the context –external and internal — businesses end up with limited insight because they’re crunching data without context.