ORLANDO, Fla. -- Ask a dozen CIOs to define big data, and you'll likely get a dozen different responses. Gartner analyst Mark Beyer says that is because big data -- for all the hype -- is still not the norm for enterprise IT professionals.
"When something becomes familiar, it starts to feel normal," Beyer said during his talk at this year's Gartner Symposium/ITxpo. "Our job, as IT pros, is to make big data normal by 2020."
CIOs can help their enterprises inch toward normalcy by distinguishing big data facts from big data fiction. "Myths play to anxiety, not actual situations," he said.
Here are Beyer's eight big data myths:
1. Big data starts at 100 TB. Stop looking for a standard size for big data, because there isn't one. "Big data is what I'm doing with the data; it's not how big it is," Beyer said.
2. You have to replace infrastructure if you want to do big data. "If I decide to change my whole infrastructure because I have a new need, I'm risking everything I've done before," Beyer said. His rule of thumb? "You have to figure out if the sacrifice of the [infrastructure] maturity is worth the risk."
3. Eighty percent of all data is unstructured. This might be one of the most often-quoted big data stats around, and, according to Beyer, it's also inaccurate. "The biggest information assets in the world are machine data. Calling them unstructured because they're not relational is a lie. Machine data is structured data." The bulk of this machine data, by the way, tends to be repeated information confirming everything's fine. "That's what machine data usually says," he said.
4. Tools will replace data scientists. Rest assured, all of the money spent to attract, woo and win over a data scientist is not for naught, said Beyer. "Tools are engineering; engineering is the reuse of a discovered fact. Science is discovering new facts." Tools won't replace data scientists -- at least not until the tools can procreate and evolve.
5. More data fixes data quality issues. "More low-quality data yields more low-quality answers," Beyer said. CIOs should keep their eye on data quality. Take the temperamental geolocation data collected by cell phones, a device some people treat as a stand-in for the human, he said. Cell phones, however, can be accidentally left at the office or the GPS function can be turned off at any given point in time. "Cell phones are not people," Beyer said.
6. Real time is just faster. Operating in real time doesn't mean speeding up the data ingestion and cleansing and analysis processes currently in place, Beyer said. It's about "making sure the interval between data collection and decision is as short as possible," he said. Plus, most enterprise data isn't needed for real-time operations.
7. Data volume trumps domain knowledge. For those who think they can simply wash their hands of big data business processes, think again. That's because "a good data scientist must be stopped" from collecting data at some point, Beyer said. Without a business process in place, data scientists will keep going and going and going past the point of providing some business value. Someone needs to help draw the line.
8. Data models are useless. The statement is a sweeping one. But, Beyer clarified, everything placed into a digital asset has a digital model. "We don't eliminate models because we have big data," he said.
Read about 10 big data cases for more inspiration on making big data the new norm.