I have spent my IT career in what I would consider to be turnaround roles. For some reason, IT is not performing as expected or required and so the organization hires a new IT leader (me) to get things aligned and in shape. In other cases, organizations bring me in to do a one- or two-day assessment of their IT processes and systems. In such roles, I have seen it all -- particularly when it comes to disaster recovery:
- In one company, when IT employees performed backups, they backed up onto the same production server. To me, this approach seemed to miss the whole point of doing backups.
- In another company, each time IT had a major outage and had to do a forced restart, the IT organization viewed the restart as its disaster recovery test so that it could claim compliance with various standards. This meant its production environment was also its disaster recovery site and bringing it back meant its plan had worked.
- A public institution had been stuck for years on implementing any type of disaster recovery plan. It wanted to replicate every element of its production environment to a remote location and simply did not have the money to reproduce and support it all.
What does this have to do with our topic of selecting a disaster recovery provider -- including cloud providers?
It is nice to do things for ourselves unless it is something we should not be doing for ourselves.
Let's start with the incumbent disaster recovery provider -- ourselves. It is nice to do things for ourselves unless it is something we should not be doing for ourselves. The company that backs up to its own servers just might lack the expertise required to have an effective disaster recovery plan and process. The company that treated a system restart as disaster recovery clearly knows little about redundancy . The institution that wanted to replicate everything did not understand service rationalization. In such situations, it might be best to use the services of someone who knows and does disaster recovery better than we know and do disaster recovery.
But, how do we select such providers? In my quest to only do business with people who know and do disaster recovery better than I know and do disaster recovery, I have established the following provider guidelines for effective disaster recovery:
- They must stratify my services into different categories. Some of my services are so mission critical that they require redundancy. Many of my services are mission critical but require recovery rather than redundancy. Any provider I select needs to know and design for these differences. For the public institution, we identified the relatively few systems that could never be down. We made just these redundant, which significantly lowered the costs of their disaster recovery plans and made them acceptable and doable.
- They must have the ability to test recovery from a disaster. Even if I choose not to do such tests myself, I want evidence that a company has done it for others (and I get such evidence not by talking with the provider but by talking with the client that had the test performed).
- They must demonstrate no single points of failure. I want to avoid the provider equivalent of backing up onto the same server.
- They must be financially viable. This is a decision I want to make once in my life. If there is any hint of a short or fly-by-night future, I disqualify that provider.
More advice from CIO Niel Nickolaisen
A CIO's how-to on vetting a new identity management system
Ten steps to making IT transformation a reality in 2014
CIO POV: Are enterprise social networks worthwhile?
In some cases, a cloud provider might be the best option to get at least some of these benefits. A cloud provider with a large, geographically distributed footprint might be less likely to have single points of failure as they should be able to replicate broadly. Assuming the cloud provider has a large customer base (it might otherwise not be financially viable), it should know best practices such as service rationalization and testing.
Today we have so many good, serviceable options for effective disaster recovery that the only thing stopping us is ourselves.
Niel Nickolaisen is CTO at O.C. Tanner Co., a Salt Lake City-based human resources consulting company that designs and implements employee recognition programs. A frequent writer and speaker on transforming IT and IT leadership, Niel holds an M.S. degree in engineering from MIT, as well as an MBA degree and a B.S. degree in physics from Utah State University. You can contact Niel at firstname.lastname@example.org.