Big data exploration and analytics for CIOs: Oh, the places you'll go
A comprehensive collection of articles, videos and more, hand-picked by our editors
Big data analytics is going to the school room. While retailers use analytics to entice buyers to purchase more...
By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content, products and special offers.
products, and automobile insurers want to learn more about an individual's driving behavior, educators are also using data science to get closer to their customers.
Alfred Essa, vice president of analytics and R&D at McGraw-Hill Education in New York City and a panelist at the recent Useful Business Analytics Summit in Boston, outlined how he and his team are applying analytics to some of education's most persistent challenges. Here's a behind-the-scenes look at how Essa trains data scientists and how his data science team is working to build sustainable and scalable data products.
1. What is the language of data science?
Python and R, two open source programming languages, contend for a top spot on the syllabus, but Essa is a fan of both. Python, which Essa called "the programming language for scientific computing," has been around since the 1980s. One of its most talked about features is its standard library for routine tasks. He likes R for statistics. Even if you're not a data scientist yourself, but are managing a team of data scientists, Essa suggested learning Python. "It's a very gentle slope -- a great language for beginners," he said.
2. How should data scientists be trained?
Practice, practice, practice, Essa said, citing Ezra Pound's classic guide to writing poetry, ABC of Reading. "His idea was read lots of poetry to prepare," Essa said. "So we do that with data scientists." He gives data scientists all kinds of data sets (on education and beyond) and instructs them to "do some descriptive analytics and just tell me what questions you can answer." In some cases, data scientists work individually; in others, they operate as part of a team. "A very important part of doing data science is interactive data exploration," he said. Essa pointed to IPython, a programming language-agnostic user interface, and IPython Notebook, a Web-based environment where users can combine code, text, mathematics and so on, as tools to aid in "interactive data integration."
3. How should companies develop data products?
"I'm a big believer in prototypes," Essa said. He pointed to the work of Michael Schrage, a research fellow at MIT's Center for Digital Business, whose book Serious Play argues that prototyping is a mark of an "innovation culture." As CIOs dip their toes into data product development, Essa suggested they examine whether their organizations have a prototype culture.
4. What is the ideal composition of a team focused on turning out data products?
For McGraw-Hill, it's an interdisciplinary team composed of data scientists, engineers and data visualization experts. Data visualization experts are especially important for analytics success because they distill big data -- text, relational, social data -- into visual, easy to consume, compelling representations, Essa said. "Users don't need big data; they need insights." Data visualization experts are visual, creative people, and they can be tough to find, Essa said, "but they're worth their weight in gold."
5. How should companies build out their big data infrastructure?
Don't go the traditional IT route. "I've worked in the IT coal mines. I've excavated and built data warehouses," Essa said. "This is a long, long slog, and the complexity behind analytics is even greater." He suggested starting simple by first crafting a high-level architecture diagram. He and his team, for example, mapped out internal and external source systems that needed to be brought into an analytics store, he said. They decided on using application programming interfaces. "So start with a simple diagram, iterate quickly and then learn from those iterations," he said. "It can be done."
6. What methodology should data teams use?
For both building data products and "standing up foundations," Essa uses an agile project management approach. His team works in two-week cycles, and at the end of each iteration, he expects to see "working code and something that can be demo'ed," he said. "Keep it simple, but keep it moving."
7. How can companies make their teams smarter?
Essa suggested taking a look at the research of Thomas W. Malone, a professor of management at the MIT Sloan School of Management and the founding director of the MIT Center for Collective Intelligence. Malone has been researching collective IQ for years. One of his findings? Hire more women. "Just by adding more women to your team, the group IQ will go up," Essa said.