Disaster recovery budgeting and recovery time objectives

If your website is down, what's the cost to your business? Learn more about disaster recovery budgeting and recovery time objectives in this expert podcast.

What's the recovery time objective (RTO) for your disaster recovery (DR) plan? How fast will the CEO come pounding on your door wanting to know when the phones, website and other basics will be up and how the downtime will affect the business? In this expert podcast, special projects editor Karen Guglielmo interviews Forrester Research Inc. analyst Stephanie Balaouras about DR budgeting and RTO.

BIOGRAPHY: Balaouras is a principal analyst at Cambridge, Mass.-based Forrester. She primarily contributes to Forrester's offerings for security and risk professionals. She is a leading expert in how companies build resilient IT infrastructures to support key business initiatives. Balaouras has been instrumental in the development of Forrester's research and offerings in business continuity, disaster recovery and information storage and protection. Her research focuses on the business strategies, operational processes and organizational structures required to govern business continuity programs, as well as the technologies for high-availability, next-generation backup and disaster recovery.


Read the full transcript from this podcast below:

Karen Guglielmo: Hello, my name is Karen Guglielmo, a special project editor for SearchCIO.com and I'd like to welcome you to today's expert podcast on disaster recovery budgetingand RTO. I'd like to first welcome today's speaker, Stephanie Balaouras, principal analyst with Forrester Research. Stephanie primarily contributes to Forrester's offerings for security and risk professionals. She's a leading expert in how companies build resilient IT infrastructures to support key business initiatives. Stephanie has been instrumental in the development of Forrester's research and offerings in business continuity, disaster recover, and information storage and protection. Her research focuses on the business strategies, operational processes, and organizational structures required to govern business continuity programs, as well as the technologies for high-availability, next-generation back up and disaster recovery. Welcome Stephanie.

Stephanie Balaouras:  Hi, Karen. Thanks for having me today.

Karen Guglielmo:  Okay, and as I mentioned earlier, we're here today to talk about disaster recovery budgeting and recovery time objectives. I'll spend the next ten plus minutes asking Stephanie to answer a number of questions about today's topics. So, let's get started. First question I have for you today is: Can you define recovery time objective and how important it is to your DR planning?

Stephanie Balaouras:  Sure. Recovery time objective, or RTO for short, measures your sensitivity to downtime. It could be for a given business application or IT system, like email, ERP, CRM, content management, your website, maybe even for a file server, or it could be for an entire business process. So, for financial accounting and reporting, order to cash for example. So, the question about recovery time objective is to ask, "How long could this business process, or this application, or this system be unavailable before it started to have a significant impact on the business itself?" And the key, too, with RTO is it will vary. There's no one RTO for the entire company or the organization. You might have some mission critical systems that need to be back online in a few hours, or other systems that need to be back in four to eight hours. So, key there. And it is absolutely critical your recovery time objective for each of your processes, applications, and systems. That will help you determine what your DR strategies and DR methods should be.

Karen Guglielmo: And how do you determine the RTO plan of your DR plan, and ensure it's covered in an FLA?

Stephanie Balaouras:  So, to determine you're RTO, you start with a process called a business impact analysis or sometimes you'll hear it referred to as a BIA for short. So, during a BIA, you actually have to classify your business process, your application, or your system by criticality. So, criticality might be something like mission critical, business critical, business important, business supporting,. And you define that criticality to determine the appropriate and the most cost-effective recovery method. So, the criticality is determined by the cost of downtime or the impact that the downtime would have on things like revenue, employee productivity, customer retention, customer satisfaction, your market share, company reputation, and so on.

So, for example, a financial trading system at a financial services firm would likely have an RTO of near zero because it's the actual lifeblood of the firm. If it went down, business would come to a complete halt. In another company, email might be considered the most mission critical system, because if email is down you actually can't communicate with other employees, with your customers, with your partners. You might still be able to function, because you could use the phone, potentially, or some other means of communication. So, it might have an RTO of a few hours, as an example. So, that's the key that you have to ask, "If this process, this application, or this system were unavailable, would you lose significant revenue? Would you lose market share?"

And also, during the business impact analysis, you would also perform a dependency mapping. So, you want to make sure that not only the process, or the app, or the system that you're trying to protect and to maintain is available, but all the dependent resources that that system, or process, or application requires is also up and available. So, it's really during that business impact analysis that you define the RTO, and the RTO is measured by the criticality of the system, which is determined by its impact and the cost of downtime.

Karen Guglielmo:Okay, so who determines the RTO at an organization? Is it with an IT or the business executive?

Stephanie Balaouras: Yeah, in my opinion, it's actually combination of IT -- IT meaning IT operations -- application owners, as well as business owners. And all of those groups need to work together to define the recovery time objectives. Certainly your business owners and your application owners understand the impact and can do a better job of actually quantifying the cost of downtime, but IT needs to be involved to help the owners be realistic about their objectives. In my experience, if you ask a business owner or an application owner what their recovery objective time should be, everyone's going to say zero. And that's just not realistic. So, without some guidance and some cost, everybody will say, "Mission critical." And that's where I think IT comes in and plays a role in setting expectations about what realistically can be done given budget constraints. And I think at the end of the day, the most important thing is that the business and IT are on the same page and that their expectations match. If the business is expecting that email is going to be unavailable for less than two hours, but IT is still recovering email from cape and best case scenario is eight hours, then there's a major disconnect between those two groups that they need to resolve. So, in my opinion, it's IT and the business working together to come up with very realistic recovery time objectives.

Karen Guglielmo:Okay, and what is the difference between RTO and RPO?

Stephanie Balaouras: So, RTO, like we mentioned, is your sensitivity to downtime. RPO, which stands for recovery point objective, is your sensitivity to data loss. So, we'll use the email as an example again, because that's a system that everybody uses on a daily basis. RTO would measure how quickly email needs to be back up and running so that you have the ability to send and receive email as quickly as possible. RPO would measure how much data you could lose. So for example, imagine that your email server crashed and you hadn't taken a back up since last night. You have the potential to lose up to eight hours of email. For some businesses, losing up to eight hours of email might be acceptable. For other business, eight hours of losing emails, that historical data, is completely unacceptable; so RPO, your sensitivity to data loss.

So, they don't necessarily have to match either. RTO can be different from RPO. So, again, use the email example, it might be more important to you as an organization to ensure that email is up and running as quickly as possible, within a couple of hours, so that you can send and receive email. But at the same time, it might be absolutely critical to you that you not lose a single email. Maybe you're in a highly regulated industry, maybe you're in a very logistic industry and it's absolutely critical that you keep emails. In which case you're RPO might be zero. In other cases, it could be the flip, you might have an RTO of a couple hours, and an RPO of,
"It's okay if I lose eight hours worth of data, because I still have paper records of all the day's historical transactions and I can re-enter those."
So, they are different; they're very related, but you can actually have different objectives for each.

Karen Guglielmo: How difficult is DR budgeting for the CIO? Any advice you can offer for getting more dollars for the DR planning?

Stephanie Balaouras: Primary thing to keep in mind is to start with strategy, not technology. And I think that's one of the cardinal mistakes that many companies and organizations make, which is, they turn to the business and say, "We need half a million dollars for some replication software to increase our bandwidth between our sites and a whole lot of extra infrastructure capacity at some alternate data center." That's not really telling them why you need it. So, I usually recommend seven steps to help build a business case for DR and to maximize the money that you get for DR.

So, step number one is to implement a DR management process. Technology supports DR, it doesn't constitute your strategy or your specific plans. Before you can request funding for technology and for services, you need to have a framework in place to manage DR preparedness as an ongoing process, not this one time event that maybe you address every couple of years. So, you need a framework in place that says, "We are committed to DR. If affects every one of our infrastructure decisions. We have documented plans. We test those plans. We report on results to our key executives."
So, it's got to be something that you can measure and report on and have a strategy behind.

The second, and we touched on this in the questions, you absolutely have to conduct that business impact analysis, as well as a risk assessment. Again, before you can request any funding, you have to sit down with business owners to identify what your most critical processes and applications and systems are, map the dependent resources, and calculate the cost of downtime; and you have to perform a risk assessment so you actually know what are the threats that you need to protect against. Is it power failure?
Is it natural disasters? Is it man-made disasters? What exactly is it that you need to prepare for? The third thing, I actually make calculating the cost of downtime a step in and of itself. The more that you can actually quantify what the cost of downtime would be for each of those business processes and applications, the more you have a built in business case for the money that you need, as well as a guide post for how much you should be spending. If you know if email is down, it's going to cost you at least 10%
in revenue because you do a lot of sending and receiving of statements of work and sales proposals via email, then that's a good way to understand how much you need to spend on email.

But the fourth thing is actually expanding our scope beyond just natural disasters. I think DR is actually an unfortunate term, because that makes us think of just hurricanes and earthquakes and tornadoes. But you might be in a region of the country or somewhere in the globe that maybe doesn't have a high risk for those type of events. In which case, you might have this false sense of security that you don't need to spend a lot on DR. The reality is, the most common cause of downtime is actually mundane events. It's the power failures. It's IT failures. It's human error. It's network failures. So, if you can help senior executives understand that DR is more than just catastrophic events, that it's really about insuring overall availability and resilience of IT, you're much more likely to get them to spend money. Because when they think of it just as an expensive insurance policy for unlikely scenarios, that's when they take the approach of crossing their fingers and hoping it doesn't happen.

Fifth is to position DR as a competitive necessity. When you think about downtime that actually creates an opportunity for you competitors to come in and seize market share. Likewise for you, uptime creates a parallel opportunity to seize market share from your competitors. So, this helps reframe the discussions from, "DR is that expensive insurance policy," to, "DR is a competitive necessity, a necessity of doing business." And what I've found, actually, is whenever you present pure data to your senior executives and they realize that they're not in as good of a position as their competitors or their peers -- they haven't spent as much money on DR, they don't have the ability to respond as quickly, they can't be online, they can't limit data loss as well as their peers and competitors -- that's when their ears perk up or they come to the realization that they really need to do a lot more with DR.

The sixth thing is developing what I call a DR services catalog. And this gets back to, I mentioned, where IT needs to work with the business to come up with more realistic recovery time and recovery point objectives. So, what I tell a lot of organizations to do is come up with a catalog of DR services. So basically what this catalog would say is, "Okay, for mission critical applications, here's our approach to disaster recovery and here's the relative cost to deliver that preparedness." And then on down the line, you define that for business critical, business important, business supporting. And when you can bound the number of messes that you're willing to invest in and also put some associated costs with it, you can go to business owners and say, "Here's what we're capable of delivering. Here are the relative costs of each of these tiers of services." And you might be doing something like charge back, or at least reporting back of the cost to someone else. That's when they're more likely to make much more judicious decisions about recovery time and recovery point objectives, as well as it helps you limit the number of point products and point services that you might buy.

The last thing is aligning DR technology investments with other IT initiatives. So, a lot of times, companies don't actually have a separate budget for DR. It comes out of the server budget. It comes out of the storage budget. It comes a little bit out of the networking budget. A good example is consolidation. A lot of companies and organizations are undergoing various types of IT consolidation. It could be server. It could be storage. It could be data center consolidation. And a lot of the technologies that you use to facilitate consolidation, something like server virtualization, they play a great role in improving disaster recovery preparedness. So, it's a great way for you to say, "I think server virtualization is a great idea. I'm going to be supporting the other IT groups that are pushing for it, because it's actually going to help me improve disaster recovery preparedness." Or data center consolidation. Data center consolidation is a great opportunity to address a lot of the common mistakes that people make with IT recovery. They don't have two data centers, their data centers aren't far enough apart to escape local threats, or their data centers are actually in high-risk areas. You can actually get involved in those discussions, and you might be able to help other IT groups push along their priorities, as well as accomplish the priority of improving DR.

Karen Guglielmo:Okay, on that note, that does conclude today's podcast. Thanks again to Stephanie Balaouras for speaking with us today, and thank you all for listening. Have a great day.

This was first published in November 2008

Dig deeper on Enterprise business continuity management

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

SearchCompliance

SearchHealthIT

SearchCloudComputing

SearchMobileComputing

SearchDataCenter

Close