Manage Learn to apply best practices and optimize your operations.

Disaster recovery spending -- How much is enough?

This tip shows how you can recover from disaster economically.

In an ideal world, and with unlimited budgets, IT managers would spend significantly to assure that employees, customers and suppliers always had access to business systems and important information. Many organizations allocate significant portions of their IT spending each year to assure operational resilience. However too many organizations, especially those which have never experienced a natural disaster, security threat, or human error, struggle to justify spending on disaster recovery projects.

Disaster recovery spending is insurance against the risks of user downtime, data loss, and business interruption. While life insurance, health insurance, and homeowners insurance are pretty much a given, it's difficult to assess how much coverage is enough, and how much to spend.

IT managers face the same issues in planning and justifying disaster recover spending. While every organization knows it needs some level of protection, determining the extent, and the appropriate financial investment is an ongoing challenge.

Today's IT budgets are under intense scrutiny, and IT managers are being asked to do more with less. In 2000, IT spending peaked at more than $1 trillion in the U.S. alone, and almost $2 trillion worldwide. However, many of these investments did not deliver on promised returns.

A new survey of IT executives indicates that more than 90% of all projects now require return on investment justification. Disaster recovery solutions are competing with new business applications, security solutions, migrations and upgrades, operations and maintenance and IT cost reduction projects for a share of diminishing IT budgets. The challenge for disaster recovery managers is assuring that spending remains at adequate levels so they're ready, should the unlikely occur, and that important new technology, training and processes are implemented to mitigate or recover quickly from realized internal and external threats.

Unplanned Downtime (Mission Critical)
Typical Uptime
Hours Down per Year
Cost per Unplanned Downtime Hour
Downtime Risk
Very Good
Best in Class

Figure 1: Typical downtime risks for various availability levels. Note that a 1% increase in availability translates to more than $3 million in value. Comparing the cost of the disaster recovery plan with the risk mitigation value allows IT managers to make valuable spending decisions, and justify additional investments in disaster recovery solutions.

To determine how much disaster recovery spending is needed, IT managers need to perform a three-step analysis:

  1. Assess the downtime costs for crucial business systems
  2. Calculate the potential disaster risks and impacts
  3. Compare alternative plans to determine benefits of each proposed solution, and how much spending is enough.

This review helps put the risks, possible projects, and benefits into perspective. Being systematic helps executives make the right spending decisions, and justify the disaster recovery investments against other competing IT projects.

Step 1: Determining the downtime costs for key business systems

Downtime risks can be calculated by examining each business system and determining how much value it delivers. Typically, risk is measured per hour of downtime; how much revenue or productivity will be lost if the system is unavailable.

For transaction-based business systems, the downtime loss can be calculated based on the number of transactions on average during the day (or the number of transactions during the busiest hour,) and the average value of the transaction. Multiplying these two figures should give the team a good idea of how much the business system is worth per hour. For example, an e-commerce system records 1,000 sales transactions per hour at its busiest. On average, each sale is $45.00. If the system were unavailable, the business would lose $45,000 per hour. If restoration took five hours, the lost revenue impact would be $225,000.

When internal systems and infrastructure are down, users can't do their jobs efficiently. To determine the downtime risk, calculate the revenue loss per user, per system affected. Consider a company with a messaging system serving 1,000 users, each of whom is associated with $350,000 in revenue yearly. The per-hour cost per employee is $186. A single hour of downtime amounts to $186,000; a five-hour disruption would cost almost $1 million in lost revenue.

Using employee salaries as a cost basis is a more conservative approach. However, most disasters will affect revenue. So for disaster recovery projects, lost revenue per employee is the best metric for infrastructure downtime calculations.

Of course, the longer the system is unavailable, the greater the impact. The transaction may be permanently lost; there's also the risk that customers will shift suppliers, especially if they feel their vendor is not responsive. Consequently, the customers' lifetime value also should be considered. Longer recoveries may cause irreparable harm to the corporation's brand image.

These intangible risks are extremely difficult to quantify, and are often not required to justify disaster recovery spending. Yet thoroughly reviewing the intangible risks and benefits is an important part of any budget decision, because it can greatly influence the projected ROI. Along those lines, disaster recovery projects should be compared with any other IT spending initiative, to ensure that IT is prioritizing spending to meet the larger business goals.

Step 2: Risk assessment

Once downtime per hour is understood, the team needs to determine the potential for a predicted event striking the organization. If an event occurs, how long will it take to recover from the issue with today's disaster recovery plans?

Detail all potential events, including system failures, accidental or intentional data destruction, human error and natural disasters.

For each potential business risk, assign a probability of occurrence and calculate how long recovery will take, using the current disaster recovery plan. For each business system, the downtime impact per hour can then be factored, leading to an estimated risk impact.

System by system, risk by risk, the team can ballpark estimate the potential impacts, recovery times and downtime risks, highlighting the most important elements that need to be addressed in the recovery plan.

Step 3: Compare the costs and benefits of alternative plans

Use a similar ROI-driven approach to identify which risk mitigation solutions will deliver the best performance.

This analysis will identify the financial benefit, as well as the correct amount of risk reduction, and the most cost-effective solution.

A simple solution analysis table might look like this:

Database Corruption
DR Plan
Risk Reduction (Benefit)
ROI (three years)
Faster recovery tools
Local Redundancy with Failover
Remote Redundancy with Failover

The comparison shows that the 'Snapshot' solution provides the greatest financial return – mitigating a significant amount of the risk, while delivering a cost-effective solution. However, it is important to remember that disaster recovery solutions are not selected on ROI measures alone. The organization may consider avoiding data loss an extremely high priority, and may have enough funds to invest in the most comprehensive solution. The 'Local Redundancy with Failover' solution mitigates almost all of the risk, while delivering a positive return, making it the best investment (even though it has slightly lower ROI than the next-best alternative).

It's extremely important to examine the financial and business impact of a potentially disastrous event. Understanding a company's risk is crucial, and a good first step for determining the level of protection needed, and demonstrating the business value of such an investment. While disaster recovery solutions can be costly, the risks associated with not having the proper protection in place could be devastating for a company.

Tom Pisello is the CEO of Orlando-based Alinean, the ROI consultancy helping CIOs, consultants and vendors assess and articulate the business value of IT investments. He can be reached at [email protected].

Dig Deeper on Enterprise disaster recovery and business continuity planning