Data science is one hot field these days, but what exactly is it? The following definition sheds light on this...
critical new field: "Data science is the study of where information comes from, what it represents and how it can be turned into a valuable resource in the creation of business and IT strategies."
Mining large amounts of structured and unstructured data to identify patterns can help an organization rein in costs, increase efficiencies, recognize new market opportunities and increase an organization's competitive advantage. Some companies are hiring data scientists to help them turn raw data into information. To be effective, such individuals must possess emotional intelligence in addition to education and experience in data analytics.
Put differently, data scientists are not the same wine in a different bottle. Typically, they are amalgams of data modelers, statisticians, technologists and business analysts. They are able to harness the power of big data -- the vast amounts of unstructured information contained in blog posts, tweets, call detail records, podcasts, videos and the like.
Why am I hearing about data science now? For many reasons. First, we have entered the era of big data, and big data and data science are cousins. It's safe to say that we wouldn't be hearing much about the latter were it not for the former.
Second, highly visible companies like Amazon, Apple, Facebook, Google, Netflix and Twitter have used big data very effectively and seen tremendous results in the process. For instance, Google has shown a remarkable ability to predict flu outbreaks more accurately than the CDC.
Third, people like Michael Lewis (author of Moneyball) and statistician Nate Silver have made data cool. The 2012 U.S. presidential election saw the unprecedented use of big data and data scientists. Some have even said that data is the new oil.
[Data scientists] are able to harness the power of big data -- the vast amounts of unstructured information contained in blog posts, tweets, call detail records, podcasts, videos and the like.
Do data scientists use legacy enterprise tools and technologies?
In short, no. Unstructured data mostly comprises what we now call big data. Relational databases and data warehouses cannot store -- much less effectively analyze -- petabytes of unstructured data such as videos, blog posts and tweets. Relational databases and SQL statements just can't handle big data, period, given that relational databases weren't built with the same level of fault tolerance and parallel processing as distributed file systems.
For data scientists, big data tools of their trade include Hadoop, New SQL and NoSQL and columnar databases to store, retrieve and analyze petabytes of semi-structured and unstructured data, the vast majority of which comprises big data.
What's more, many data scientists use a free software environment called R (part of the GNU project) for statistical computing and graphics. Collectively, new solutions like these allow data scientists to work their magic.
From a midsize organization perspective, big data tools were far too expensive five years ago. Today, data storage has become a commodity and midsize organizations are now using Hadoop, NoSQL databases and data contest sites like Kaggle Inc. to harness the power of big data.
What can data scientists do for my organization?
More on data science
Data science: Mining for hidden value
Data scientists help businesses navigate seas of big data
Data scientists dig open source tools
There's no dearth of ways in which data scientists can benefit organizations of all sizes. In other words, big data and data science are not the sole purviews of large enterprises. Healthcare organizations like Explorys [featured in Simon's fifth book, Too Big to Ignore: The Business Case for Big Data] are marrying structured and unstructured data, decreasing healthcare costs, providing superior care and saving lives. The ability to discover new and important customer insights is particularly acute.
In a nutshell, data scientists can find needles in haystacks of big data.
About the author:
Phil Simon is a sought-after speaker and the author of five management books, most recently Too Big To Ignore: The Business Case for Big Data. A recognized technology expert, he consults companies on how to optimize their use of technology. His contributions have been featured on NBC, CNBC, Inc. magazine, BusinessWeek, The Huffington Post, The Globe and Mail, Fast Company, The New York Times, ReadWriteWeb, and many other sites. Write to him at firstname.lastname@example.org.
Learn what qualities you need to be a good data scientist