Ronald Hudson - Fotolia

Get started Bring yourself up to speed with our introductory content.

How to do an advanced data analytics project on the cheap

Delivering an advanced data analytics project to the business can cost a fortune. But with some imagination, CIOs can do it for pennies on the dollar. Niel Nickolaisen explains how.

IT leaders and departments have a huge opportunity to change the perception and reality of our value to our organizations....

We have in front of us -- right now -- the resources we need to deliver an incredibly high-value service to our organizations and do it for dirt cheap. That is right, dirt cheap.

Before I tell you about this opportunity, let me confess that what I am about to tell you will seem counterintuitive and, for many of you, countercultural. We have been told for years that if we want to deliver high-value services to our organizations, it will cost tons of money. Yes, delivering high-value services could cost a fortune, but not if you do what I have done. Here goes:

One of the best ways for us to move the needle in our organizations' lives is to do a dirt-cheap advanced data analytics project.

I know this is possible because I have done it (and on the cheap). Here is my story.

I was the CIO at a large university. During the time I was there, the leadership team of the university had an overriding goal -- to increase graduation rates. Whenever we met, we always talked about what we could do to increase graduation rates. During one of these meetings, I made a somewhat off-hand, cynical remark. "If we only admitted students that we knew would graduate, our graduation rate would climb to 100%." Everyone chuckled and then returned to talking about changing our curriculum and our student support model and our faculty model.

After that meeting, I pondered a bit more the notion of admitting only those students we knew would graduate. We had an admissions model -- a model built by academic experts. It included multiple factors that helped us decide which students to admit. The three most important factors were the prospective students' scores on mandatory English, writing and math competency exams. Thus, every prospective student had to take these exams (and score well on the exams) in order to get in.

As I thought about this, I wondered if our data supported our admissions model. Certainly, we had quite a bit of data about our prospective students, the students we had accepted and those who had subsequently graduated. It would be great to analyze this data to see how it correlated with our admissions model. But, how to start? I had no real data scientists on my staff. I did not have the massive, expensive environment or the tools for an advanced data analytics project.

But, it turned out I did not need those things, at least not to get started. I posted my problem and my data sets (obviously with anonymized data so as not to reveal the personal information of the university students) on a data challenge website and let some really bright data scientists, analysts and statisticians from around the world create my new admissions model for me.

It took the world about two weeks to develop a model that was better than I hoped it could be (given my data set), and so I declared a winner and issued the prize to the winning team. How much did this advanced data analytics work cost? $3,500. (Not $35,000, not $350,000, not $3.5 million.) If the data model did not work, I would have been out $3,500 -- an amount I could easily cover in my existing budget.

At a cost of only $3,500, we changed the way the university operated and set the stage for a future of data-based decisions.

What value did the new admissions model create for the university? The data revealed major gaps in the human-defined admissions model. The data revealed that what we thought was the most important factor was, in truth, the sixth most important factor. The second most important factor was really ninth most important. And so on. As we looked at the data, we realized that we did not even need many of our prospective students to take the English, writing and math competency exams. If their data for the most important factors was good enough, they were in.

At a cost of only $3,500, we changed the way the university operated and set the stage for a future of data-based decisions. We refined and focused our marketing and recruiting. (We now knew the important characteristics of who would be successful students, so why target anyone else?) Knowing the background factors that led to student success, we started to support students in those areas. For example, no previous coursework in science, technology, engineering or math (STEM) resulted in no admissions. That meant that matriculated students who were weak in STEM needed increased attention and support in order to increase their chances for success.

After the success of this data analytics project we made the investments that led to a comprehensive, advanced analytics student support model that helped us identify at-risk students so that we could turn the massive resources of the university on those students and help them succeed.

Oh yes, and the IT team was the group of geniuses that had delivered these amazing results.

Data analytics project: Getting started

If this interests you, here is how you can start. First, think of a few gnarly problems that have vexed your organization for a long time. These problems could be customer retention, manufacturing yields, demand predictions, marketing targets, et cetera. Next, look at the data available to solve the problem.

  • Do you have the data you might need?
  • In what form is the data?
  • What data are you missing and how can you fill any gaps?

Then, find a willing group to develop your model. There are data contest sites. There are local universities looking for student projects. There are local big data/data scientist user groups that can get you started. And, if you do not have the data to solve your gnarly problem, you are out a small amount of money and you can try again.

One more thing: As we built out our advanced analytics competencies at the university, we never did make a major investment in any type of big data infrastructure or environment. It required massive amounts of compute power to run our "students-at-risk" model, but since we only ran that model every two weeks, we rented that compute power in the cloud for some hours twice a month. We paid about $3,000 a month to run a model that consumed around 2,000 data elements for over 50,000 students. A bargain, even for something that generated low value. In fact, in our case it was an honest to goodness steal, since this model improved student retention by ten percentage points and generated tens of millions of dollars in savings. You can do the same.

About the author:
Niel Nickolaisen is CTO at O.C. Tanner Co., a human resources consulting company based in Salt Lake City that designs and implements employee recognition programs. A frequent writer and speaker on transforming IT and IT leadership, Nickolaisen holds an M.S. in engineering from MIT, as well as an MBA degree and a B.S. in physics from Utah State University. You can contact Nickolaisen at [email protected].

Next Steps

Recent columns by Niel:

Untested DR/BC plans are a recipe for disaster

ITSM, digital-style

The 21st century data center is no place for CIOS

Dig Deeper on Enterprise business intelligence software and big data