Disaster recovery and business continuity planning guide for CIOsExecution: Recovering from disaster, focusing on business continuity <<previous|next>>
10 essential steps to include in a disaster recovery plan
By SearchCIO.com Staff
No one is ever fully prepared for a disaster, regardless of what type it is. Hurricane Katrina was one test of whether IT disaster recovery plans were sufficient. This expert podcast offers 10 essential steps CIOs need to include in a DR plan that will hold up through any disaster.
SPEAKER: Pierre Dorion, certified business continuity professional, Mainland Information Systems Inc.
BIOGRAPHY: Dorion is a business continuity consultant at Phoenix-based Mainland Information Systems Inc., specializing in business continuity planning, backup and recovery and data availability as well as IT processes development. During the past six years, he has focused primarily on storage systems architecture and business continuity services engagements. Dorion is also an IBM Tivoli Storage Manager Certified Consultant, an authorized IBM TSM Instructor and holds the ITIL Foundation Certificate.
Download for later:
- Internet Explorer: Right Click > Save Target As
- Firefox: Right Click > Save Link As
Read the full transcript from this podcast below:
10 essential steps to include in a disaster recovery plan
Karen Guglielmo: Hello my name is Karen Guglielmo, Senior Editor for SearchCIO.com, and I'd like to welcome you to today's expert podcast on You Have DR Plans, But Will They Work?
I'm joined today by Pierre Dorion, Certified Business Continuity Professional, and Senior Consultant with Mainland Information Systems. He specializes in the areas of business continuity, and disaster recovery (DR) planning services, backup and recovery, and corporate data protection.
Over the past nine years Pierre has focused primarily on business continuity and DR related consulting services, as well as data storage, and backup infrastructure architecture, and assessment engagements. He has been a consultant to leading organizations in the oil and gas, manufacturing, and distribution, government and public sector, utilities, telecom, and transport industries.
His engagements include business impact analysis, IT risk assessment, recoverability assessments, recovery strategy development and backup, and environment architecture.
Today Pierre is joining us to talk about DR plans. Pierre, the floor is yours.
Pierre Dorion: Hello, my name is Pierre Dorion, Senior Business Continuity Consultant with Mainland Information Systems in Phoenix, AZ. Thank you for downloading this podcast, in this recording I will discuss some of the essential elements that should be part of a disaster recovery plan.
Over the last six years or so we've been exposed to more catastrophic events around the world then maybe an entire generation would have imagined possible. From terrorism to tsunami, massive hurricanes, we have also been exposed to the threat of mass casualties that could be caused by a pandemic flu, or even worse, chemical weapons.
So it's not surprising that business continuity planning has been getting increased exposure as a result. What seemed unthinkable a decade ago has now become a credible threat to many organizations. In fact, in many cases, the ability to resume business activity following an unplanned disruption is slowly becoming a contractual obligation for many companies wanting to do business together. So in light of that, most businesses today have some form of disaster recovery strategy in place.
That said, not all disaster recovery plans, when there is one, are created equal. For the purpose of our discussion here, we're going to focus on IP disaster recovery much more than business continuity itself, but with the understanding that one is pretty much useless without the other. For example, having computers fully recovered, and ready to support business functions without a workforce in place is really no better than a relocated workforce, but without business tools to work.
So where does it start? Well the first step in DR planning is to understand what is truly critical to keep your business going, and how long your business can function without it. In other words what is your maximum tolerable outage? That's usually driven by how much money you're losing while you're down, but it's also sometimes based on how it affects public perception, or the image of the brand. So criticality is not always defined in terms of financial losses, this is what I'm trying to say here.
Add further to that, you probably can't recover everything all at once, nor would you want to. So you have to establish recovery priorities, and dependencies. By dependencies, what I mean is that certain elements of your IT infrastructure must be in place for other ones to work. Using a very simple example here, we can say that, in most cases, the backup server, and the network usually have to be up before you can restore data from tape, that's just an example here.
So now let's talk about the strategy, the DR strategy. Based on how quickly your company needs IT up and running, you can now decide what technology you will use to meet your recovery of objectives as we defined previously. Does it make financial sense for me to own an alternative recovery site? If I need to build a redundant data center, can my company carry that financial burden, or should we share the costs with other companies by going to co-location facilities? These are all questions we need to answer, and that's where the strategy come in.
Remember that the cost of a recovery to add strategy should never exceed the losses we are trying to prevent. And also know that in some cases doing nothing can also be a strategy as funny as it may sound. A local business whose customers are all relocated following a disaster may have the best business plan in place, but without customers, their recovery effort becomes somewhat useless.
So now let's talk about plan. A disaster recovery strategy in itself is not a plan. A disaster recovery strategy usually involves technology, vendors, and distributors, and is built on a number of assumptions. All of these elements make their way into the disaster recovery plan, but are not the plan.
Think of it as having all the ingredients to cook a great dinner, but no recipe, results may vary. One common mistake is to count on your very skilled IT staff to take care of things. After all they always manage to fix things pretty fast when something goes wrong. Ask yourself one thing though, what happens if they are not there, unavailable? Or how about if they don't have access to their documentation, or even worse, what if they haven't seriously documented anything?
It's one thing to understand an IT environment very well; it's another to put it back together quickly, and under pressure. In fact, everything can eventually be recovered, but can you afford to wait a few days, or even a few weeks? Not always.
Moving along now, who says it's a disaster? Who's in charge? Who declares a disaster? Who does what during the recovery effort? How about if the most skilled IT resources in your company are not available, who are their backups? These are all questions you have to answer, and include in your disaster recovery plan. This needs to be documented. Your plan must include a defined chain of command, employee notification procedure, clearly documented disaster declaration procedures, etcetera.
Remember there are often costs associated with declaring a disaster, so unless you have that clearly documented, this may present some issues. This is why you have to form a crisis management team, or CMT as we call them before starting the development of a disaster recovery plan.
The CMT will actually own the disaster recovery plan, and all the related processes. That's why the CMT leader, whoever you pick, should be a person of authority, and experienced in your company. The crisis management team leader must be able, and empowered to make potentially critical business decisions in times of crisis. Know that the crisis management team is often called, or referred to as, the business continuity steering committee during peace time as we say, and becomes the CMT during a crisis. It could be the same people.
Let's talk about testing now. We can't say enough about testing a disaster recovery plan. So now that you've built your disaster recovery plan you need to test it. Let's start with stating that testing the ability to restore the financial data base from backup tapes is not a disaster recovery test, a lot of people seem to think that. This is only proving that the backup software you bought actually works, and that tapes sometimes go bad.
A useful disaster recovery exercise should actually test the entire plan, and all your recovery procedures. It goes back to the distinction we made earlier between a disaster recovery strategy, and a DR plan. Restoring from tape is a strategy, when to make the decision to restore, who will do it, how long it should take, and in what order things should be restored is what needs to be tested, that is the plan.
Testing the entire plan is the only way to make sure that the plan is actually functional. This is where you will identify what works, and more importantly what doesn't. And it gives you a chance to tweak your procedures before you really need them.
Now moving on to the next element of disaster recovery planning. Let's talk about change management. Once a disaster recovery plan has been implemented, it should always be closely tied to the change management process. Hopefully you have one. In fact, some industry professionals will tell you that companies with a DR plan, but without a change management process don't really have a disaster recovery plan. This is mostly because statistics show that many IT disasters are caused by human error.
I think we can all agree that making changes to production systems without a strong change management process is probably the single biggest source of human errors leading to unplanned outages, so it makes sense.
Change of request through your change management process should always include information about the impact of the change, the proposed change on the DR procedures, and the plan itself. Best practice is also to have a member of the CMT, crisis management team, or the BCP governance committee to sit on the change management board.
This is to make sure that the DR planning is never left out of changes. It also provides us with an opportunity to start planning for disaster recovery when new systems or applications are rolled out, so we include disaster recovery planning in the systems development life cycle, or SDLC as we call it.
Now let's brush briefly on your suppliers. Are your suppliers ready? We notice that many disaster recovery plans often rely on supplier, or vendor SLA's for rapid delivery of replacement of equipment in the event of a disaster. One question, did you read the fine print, really? Some SLA's actually exclude events other than equipment failure. So companies should really review the agreements thoroughly, the agreements in place with their vendors before relying on them as part of their recovery strategies.
It's also common practice to rate your critical vendors, and do a risk assessment of their ability to deliver when you most need them. You can sort of calculate their risk of failure to deliver based on your past experience with the vendor. If a vendor has failed you in the past, there are no guarantees it won't when disaster strikes. You should have contingencies built around that to make sure you're covered.
The last element here, let's talk about crisis communication plan. This is a little bit more of a business continuity element, but it does tie into disaster recovery planning as well. This component is very often neglected when developing a disaster recovery plan. Companies generally do a poor job about communication in general, of course outside of marketing new products that is. And DR planning is no exception.
A good disaster recovery plan should include a documented internal communication plan to communicate with your employees during a disaster. The success of the recovery may actually depend on it. So an example of that is a 1-800 number for updates, and special instructions, because you don't necessarily have access to your normal modes of communication. Also an external communication plan should be in place to communicate with your clients, shareholders, suppliers, etcetera. Again 1-800 numbers can work very well here.
There also should be a strict policy on employee communication with the media, depending what field of business you're in. In fact there should always be a designated PR officer to give the media representatives what they need before they start trying to get it from your employees.
We can spend a lot more time talking about disaster recovery planning, best practices, all the elements, but, generally speaking, remember that a recovery strategy is not a recovery plan. Your company may employ very knowledgeable IT staff, have great SLA's in place with vendors, and have a hot site subscription, but without well documented, prioritized, and rehearsed recovery procedures, you really don't have a plan.
Well this concludes this podcast. I hope you've enjoyed it, and found the information useful. This is Pierre Dorion, Senior Business Continuity Consultant with Mainland Information Systems in Phoenix, AZ. Thank you for listening, and stay tuned for more podcasts.
Karen Guglielmo: And on that note, that concludes today's podcast. Thanks again to Pierre Dorion for speaking with us today and thank you all for listening. Have a great day.
01 Oct 2007