Article

A 13-hour power outage puts disaster recovery plan to the test

Shamus McGillicuddy
On the night of Oct. 25, Richard Ridolfo, CIO of aviation consultancy Simat, Helliesen & Eichner Inc. (SH&E), received the call every executive dreads. Disaster had struck, the hub of his company's technology infrastructure was in the dark, and no one knew when it would be back online.

"What happened was our building had lost a primary power transformer," Ridolfo said. The backup transformers had failed, too, and the fire department was investigating reports of smoke in the electrical room. The utility company was unsure when the power would be back on, and the fire department was restricting access to the building.

    Requires Free Membership to View

People and process are key

Calvin Braunstein, CEO of Robert Frances Group Inc. in Westport, Conn., said there are four things a company's disaster recovery plan must address: technology, people, processes and information.

People tend to be very good at taking care of the disaster recovery of hardware, but not necessarily everything else. In the case of SH&E, the positive aftermath was indicative of a true business continuity plan, not just a disaster recovery response. A good business continuity plan reaches over to the business side.

"Not everyone understands the difference between the two," he said. "With disaster recovery, companies are usually worried about the data center."

Companies can say they plan for everything, but there are things that are going to happen that simply cannot be imagined. "Who would have expected the World Trade Center? People planned on failures for Hurricane Katrina, but not to that level."

The outage occurred in SH&E's Cambridge, Mass., office. SH&E is headquartered in New York, but its Cambridge office is home to 40 employees -- 30% of its staff -- and the company's core data center.

"If you take our disaster recovery backup facility out of the picture, Cambridge is the hub of all our corporate computer assets," Ridolfo said. "Everything hubs off this site." He said the Cambridge office even serves as a hub for the IP telephony service of the company's London and New Zealand branches.

Ridolfo had been developing a new disaster recovery (DR) plan for nearly 18 months, in response to the Aug. 14, 2003, blackout that left New York and much of the Northeast in the dark for days. He had a new backup data center in nearby Somerville, Mass., and his employees were all enabled to work remotely during a disaster.

"The majority of our professional staff travel, and they have always had some degree of ability to work remotely. It's really only been since the power outage in New York that we've consciously put in an effort to redesign and configure our systems to be accessible from anywhere. Not having power for three days in New York taught us what was at risk."

During this most recent go-around, a call went out to senior IT staff and senior executives to initiate a call tree that would inform the entire business of the situation. The power failure hit at 8:30 p.m. By midnight, everyone in the company had received a message.

Next he verified that the company's backup data center had taken over for Cambridge. "We ran into a few items that didn't work, so we had to make manual configuration changes."

By the next morning, the power was still out. One employee was sent into the building to collect any laptops and printed files still inside that were critical for employees. "All our [Cambridge] employees who were not traveling worked from home that day," Ridolfo said.

The risks of not planning for such a disruption to the business were made plain to Ridolfo as he watched the chaos unfold that morning as other tenants in his building struggled to deal with the situation.

"It was a huge mess," he said. "People were showing up for work with no clue. Us, we had one person show up who didn't get the message."

Ridolfo said there was minimal impact to his business. The Cambridge telephone system rolled all calls to the New York office, and the operator would reroute calls according to a list of alternative phone numbers for employees.

More on disaster recovery
Survey: CIOs lack confidence in their DR plans

CIOs slow to make telecommuting a part of their DR plan
The one problem with the event was with Ridolfo's global IT help desk. The desk's staff is based in Cambridge. With the team working from home that day and accessing systems through a virtual private network connection, the number of tickets staff members could handle was limited by their connectivity while the volume for help desk tickets shot up.

Ridolfo controlled this issue by explaining to employees that lower-level fixes would have to wait until the event was over. The help desk instead concentrated on access issues for employees who were working from home.

Ridolfo said the success of his disaster recovery plan was also tied directly to early vendor involvement. It's absolutely critical to include your vendors in the planning and testing process. SH&E relies heavily on a network management firm, Atrion Networking Corp. in Warwick, R.I. Throughout Ridolfo's DR planning and implementation, Atrion stepped up and ensured that the designs were sound and fully tested. Bringing vendors up to speed while in the middle of a crisis "can be a nightmare," he said. By making Atrion part of the whole planning process, Ridolfo was able to issue simple requests that the vendors were already prepared to execute.

"This freed my team up to focus on the highest-priority issues."

While Ridolfo had hurricanes, terrorist attacks and major blackouts in mind while he was designing his disaster recovery plan, he never thought something as small as a blown transformer could pose so much risk to his company.

"We always think of the big things that can impact a business," Ridolfo said. "We tend to forget that small things can trip us up quite a bit. The transformer in our building was put in by the power company 15 or 20 years ago. When it failed, they had trouble finding parts for it. That's really something you don't anticipate. What this illustrated was that there are all manners of problems that could happen to take us out of service. You can't see what it is, and you have to build around that. It's not just terrorism and natural disasters that you have to worry about. It's the tenant above you that might spill some cleaning fluids and force you to evacuate for a day."

Ridolfo said having his Cambridge office offline that day would have cost his company $45,000 without a successful disaster plan.

"The longer we're offline, however, the per-day cost to the business climbs dramatically," he said. "A single day or a few days' event can generally be ridden out by rescheduling meetings, working from documents stored on laptops, etc. After a few days, we run the risk of meeting client deliverables, processes like invoicing and vendor payment begin to slow or halt, selling new business becomes much harder as you focus on just keeping the existing clients happy. Missing a deadline for a client could have an impact of hundreds of thousands or a million dollars, depending on the work.

"Our CEO was thrilled with our response and the outcome was much better than he anticipated," he said. "You never now how well it's going to function until you have to go through it."

Let us know what you think about the story; email: Shamus McGillicuddy, News Writer


There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: