olly - Fotolia
I have designed several data centers over the years and helped companies deal with both disaster recovery and avoidance. These IT facilities, I can attest, can be found in just about all areas of a building, from second- and third-level basements to every floor above ground. Experience has taught me that data center disasters also come in just about every imaginable -- and unimaginable -- type.
Consider this case of a data center on the second story of six-story stand-alone building.
There had been a strong rain, but not so much that anything around the building was flooded. But without warning, said data center, which the CIO believed was protected from flooding because it was on the second floor, flooded. The electrical surge caused by the water took the power out of the building -- everything was out.
Upon inspection, the company found about a foot of water that came almost up to the data center's raised floor. Because the water was under the raised floor, the operators had no inkling of the problem until the alarms went off in the raceways where the wiring ran. (Raceways are large wiring conduits that typically run under a raised floor in a data center but cannot be totally sealed.)
Here was a mystery: There was no flooding on the first floor except what was coming down from the second floor. The parking lots and basement were dry. All of the floors above were also fine -- no broken pipes. But the data center was virtually destroyed.
Readers, I've been invited by SearchCIO to write about my experiences in IT over the past 30 years. Before I begin, permit me to assert my right to have an opinion and to share stories.
I have been involved in a combination of management consulting and IT for 40 years. In that time, I have been a senior consultant with PWC, a CIO, the director of almost every area in an IT department, the owner of a small consulting firm, an executive consultant for IBM Global Services, and the North America practice head for business process transformation for Infosys, Capco and Tata Consultancy Services.
Along the way, I have written five books, become a member of the ABPMP International Board of Directors, a member of the PEX Global Advisory Board, a member of the Forrester BPM Council, and a member of the Business Architect Association's board. Altogether, I have written over 80 papers, columns and articles and spoken at over 40 conferences. I have also been involved in most industries and worked on projects of all sizes -- including several transformations.
In short, I have been around a few proverbial blocks in my professional journey. This series of articles will provide a look into the situations I have encountered and some of the problems that my teams and I have had to address. Although some things may seem hard to believe, they are all true
We investigated what had happened over the months leading up to the event and learned that the company had installed a new mainframe. For the installation, a window was removed and a crane was used to lift the mainframe up and in. The window was then put back and resealed. The computer's water chillers were OK and there was water detection in the wire raceways. So far, so good. The data center itself looked to be well-planned.
Looking further, one of our team members noticed there was a decorative narrow walkway around the second floor that looked like it sloped inward -- a potential problem -- but there were drains in place that would have shunted water from the building. So, what was the source of the flooding?
With the data center equipment removed, we could see that the caulking around the replaced window had given way. As it turned out, those well-placed drains for the decorative walkway were clogged and the water had built up against the outer side of the walkway and poured through the caulking gap of the replaced window. The situation was a perfect storm of unlikely events: The building management did not check to make certain the drains on the walkway were clear, and inferior work in resealing the window had created a situation just waiting for a big storm.
Another case involved a medical center campus in the central U.S. The campus had multiple facilities. IT was in a separate building across the street from the main hospital. IT operations were above ground; buried underground were protected cables to all the buildings on the campus. For years, there were zero problems. One summer day, a storm rolled in and lightning hit the street between the data center and the hospital, just above the cable to the hospital.
What are the odds of that happening?
Doesn't matter, it happened -- and the lightning strike melted the street right down to the cable. Before the cable melted, the surge that coursed through it destroyed computers in the data center and the hospital -- all of them literally melted or fried. Nothing was really usable. Surge protection, I'm here to tell you, can be overwhelmed. Sprinklers can then go off if things melt and give off gas and some things can actually blow up.
Preparing for data center disasters
These data center disasters happened several years ago. Today, there are different tools to help deal with the impact of a prolonged data center outage, including disaster recovery in the cloud and distributed server farms that automatically transfer processing off compromised servers. However, even today, we experience disasters that can cause unanticipated types of trouble. We have only to look at hurricanes Irma and Harvey, and the magnitude 8.2 and 7.1 earthquakes in Mexico to know that trouble is always just around the corner.
So, assuming you have a disaster recovery plan, here are some suggestions for mitigating the impact of data center disasters:
Have you hardened your data center or the company work facility? Has someone thought about all of the usual suspects and the improbable causes of data center disasters? Hurricanes, tornados, thunder storms and a truck knocking over the transformer that was judiciously placed next to the loading dock are only a few of the problems that can bring down a data center.
When trouble finds you, that is when disaster recovery must kick in. Is your recovery plan up to date? Is it adequate to address the growth in the company and its IT support since it was last tested? Is it clear who can declare a disaster and set the recovery plans into action? Is your call tree up to date or have some of the people left the company? If you are relying on the cloud for backup, are you certain you and all who need to can access it in a power outage or storm-caused disaster?
There are always unforeseen dangers. Hardening a building is expensive. When you think you have covered all situations, think of what is possible but improbable and then run the cost-benefit numbers to determine the best course of action.
Disaster prevention and recovery is a type of insurance. It is a tradeoff between probability and the extent of the impact and damage. That is obviously a board-level choice. But, I believe, avoidance is best; quick recovery must come next.
Create a plan for enterprise transformation
Control the iteration process or lose out on Agile benefits
From legacy apps to new tech: Bridge the gap