It took about two months into my job as the new CIO before people started bringing me the projects the former CIO...
had never gotten to. These projects ranged from applications that were rightly denied to the "ankle biters" -- those irritating little things that probably should have been done just to get them out of IT's way. Then came the email I always dread. The author was the CFO. The message said that in order to reduce our business insurance costs and comply with regulatory guidelines, we needed to implement a disaster recovery (DR) plan, and since DR was clearly something IT owned, he would appreciate my getting it on my project list and getting it done. I dragged myself out of my chair and office and walked to the CFO's office.
"Niel, it is good to see you. What can I do for you?"
"Your email about disaster recovery got me thinking, and so I thought it would be good to share my thoughts."
"Excellent, I was hoping that would happen. By the way, we are really happy that you are here."
"I, too, am happy that I am here, but you might change your opinion after we talk about disaster recovery."
"Why is that?"
"In your email, you said disaster recovery belongs to IT. I am afraid that is not quite the case. Disaster recovery planning belongs to the entire organization. IT participates in the planning but should never own it."
"Never own it? How can that be? If there is a disaster, you need to make sure we can recover our systems."
"True. And we will. But, disasters affect so much more than just our systems. The things that can happen to our systems are just a subset of the things that can bring the organization to its knees. Our disaster recovery planning should consider all of our organizational risks and those go far beyond just our system risks."
The CFO quickly grasped what I was saying and agreed to expand our definition of disaster recovery planning. As a first step, we got volunteers -- some of them forced volunteers -- from every major department to meet to build our disaster recovery plan.
Score risks by impact and likelihood
Such planning can be a nightmare of long meetings, long speeches and disagreements with no progress. To stop this from happening, I take a risk assessment/mitigation approach. Here's how it works.
I start by having everyone on the team brainstorm -- and every wild idea is welcome -- all of the possible disasters that can bring us to our knees. This list varies by business and location, but includes: disgruntled employees, floods, fire, data breach, loss of key employees, loss of key technologies and everything that can take out a branch office or site -- power outages, et cetera. For each of these types of risks, we score them using a combination of impact and likelihood.
For risks that have low impact and low probability, we should probably not define how we will mitigate them. For risks of high impact and high probability, we had better have something really sound in our plan to deal with such incidents. If something has medium impact/high probability or high impact/medium likelihood, our disaster recovery plan should cover such risks.
Once we have scored the risks, we define mitigation plans, which should map correctly to the risks. Disaster recovery can be expensive and it is easy to over-invest in recovery options that we will never actually trigger. And, because redundancy -- in systems, processes and capabilities -- is incredibly expensive, we should have redundancy or partial redundancy only on the high impact/high probability risks. For everything else, we think of how to quickly recover from a disaster, with "quickly" being highly situational. (As an aside, I once talked with a CIO of a public utility who said that in the aftermath of a major hurricane, the utility's disaster recovery plan was put on hold until it could ensure its employees were safe, sound and available to implement the disaster recovery plan -- something, by the way, their plan had not anticipated.)
Bring the popcorn
The next step is to implement the plans for all those risks that deserve mitigation. But, do this in steps based on the risk assessment. Take care of the high impact/high probability mitigation plans first.
The final step in this process is to test the plans, which brings with it an element of perverse fun. It is always entertaining to watch the world fall apart -- in a simulated and safe fashion: Pull the plug on a process and watch the panic ensue. (Make sure you tell the person who is essential in the financial reporting process to observe the process rather than run the process.) Have a remote site perform manual transactions for a few days to see what else breaks.
Once the test ends, assess the results and re-do your risk mitigation plans. No matter how good the team was that defined the plan, they almost certainly got it wrong. And through it all, remember: IT participates in, but does not lead, this effort -- and certainly does not own it.
About the author:
Niel Nickolaisen is CTO at O.C. Tanner Co., a Salt Lake City-based human resources consulting company that designs and implements employee recognition programs. A frequent writer and speaker on transforming IT and IT leadership, Niel holds an M.S. degree in engineering from MIT, as well as an MBA degree and a B.S. degree in physics from Utah State University. You can contact Niel at firstname.lastname@example.org.
More CIO advice from Niel Nickolaisen:
How to build a custom cloud on the cheap
A BYOD resistor becomes a BYOx champion
Social tools are the new email