For some organizations, it takes an act of God to get serious about a disaster recovery plan for remote offices...
and branch offices. For Cancer Treatment Centers of America (CTCA), the organization's mission to treat patients like family members was more than enough motivation to make sure IT did it right, said CIO Chad Eckes. The proximate cause was a move from paper to electronic health records.
"Our promise to the patients was that we could ensure all of the appropriate documentation for their care was available as they moved through our very complex and very speedy system of treatment. We literally operate faster than any other organization that I know of in health care," said Eckes, who was recruited in December 2005 to spearhead the digital makeover at the for-profit CTCA, which has hospitals in suburban Chicago; Philadelphia; Tulsa, Okla.; and suburban Phoenix.
CTCA is what is sometimes called destination medicine. The average patient travels 500 miles to be treated at CTCA and through the course of treatment might be propelled through multiple departments within a hospital or even receive care at more than one CTCA center, Eckes said. "The paper world couldn't keep up with movement of the patient," he said.
Going paperless, however, posed considerable risk without the technology infrastructure to support it. The electronic health records system could not go down. CTCA's reputation is predicated on its "Mother Standard" of care, a trademarked mission to treat patients as family. Electronic medical records needed to be reliably and securely managed, readily accessible for input and output of information and connected to the hospital's medical equipment.
"Those reasons are why we started going down the path of building a highly redundant infrastructure that was focused on disaster recovery," Eckes said.
A layered approach to disaster recovery for remote locations
CTCA needed to have centralized data centers because its widely dispersed hospitals share electronic health records. That requirement affected all three risks that must be managed for in a disaster recovery plan -- power, applications and data, and the network.
Managing power for remote locations is not so different from local sites, Eckes said. "You always want to make sure that you have dual power grids, you always want to make sure you have uninterrupted power supplies, you always want to make sure that you have generator backup to run all the systems," Eckes said.
CTCA takes that a step further for its Phoenix location, where a UPS system supports the entire hospital, versus just supporting IT to ensure against an outage on its medical equipment. Similarly, dealing with application and data redundancy, in concept at least, is no different for local or remote sites, Eckes said, but the risk impact for CTCA is heightened because of its centralized databases. "If an application goes down, that not only impacts the service offering at one hospital but across our four facilities," he said.
CTCA has built in four layers of redundancy for its systems data, and has a fifth layer of redundancy in the event of a worst-case scenario -- an approach Eckes touts as uncommon, if not unique, among health care organizations.
- Every one of the production systems is clustered, so if one part of a cluster fails, the systems remain up and running.
- Every piece of data from all sites is immediately mirrored to a second data center, so in the event of an outage, CTCA can shift processing to the redundant center with no data loss. The CTCA located its second center 59 miles from its primary center in Schaumburg, Ill., a strategic decision based on the speed of data transmission. "We chose to have it this close because we couldn't replicate without it being this close. Information can only travel so fast over the lines."
- Backups are stored to disk. "Disk is very fast to restore off of, and we have that immediately available in our data center. We keep seven days' worth of backups on disk." CTCA backs up approximately 4 terabytes nightly.
- Standard tape backups are stored off-site in a vault facility in downtown Chicago. "We can't keep the disk backup as long as we want to, from a cost standpoint. It wouldn't be prudent. We also want to protect against a situation where both data centers go down."
- Data is in PDF format. In the event that all other redundancies fail, CTCA "has written a massive dump of data that goes out to all the individual sites." The data, which includes all vital patient information needed for care, is pulled every four hours and stored in PDF format on a server in each of the hospitals. "In the worst-case scenario, folks at that hospital can go to the server, print it off and be taking care of our patients safely."
The upshot is that if CTCA loses its main data center today, every system can be up and running within two hours; Real-time replication guarantees zero data loss, Eckes said.
Taking control of the uncontrollable: Network redundancy, with two WANs
But probably the toughest aspect of building the infrastructure to fully support going paperless was achieving network redundancy, the rung of disaster recovery that is not fully under one's control, Eckes said. The easy part was the LAN networks within CTCA's facilities. "We had a strong partner in Cisco. We built high redundancy in everything we have done at the facilities."
Designing a structure at the metropolitan level and the wide area network level (WAN) proved more difficult.
"What we did -- and we're told we have one of the most complex designs in greater Chicagoland -- was to design two full-production, wide area network WANs," Eckes said. One WAN is with AT&T and one is with Qwest. The WANs run synchronously and "are sized at a point that allows us to run on either/or and still have plenty of bandwidth to run both of our facilities," he said. The WANs transmit 20 megabytes per second. In addition, all of the Cisco gear can shift processes automatically and immediately, if there is a problem with either one. Eckes also negotiated with the two telecom providers to make sure the CTCA networks are on independent fiber, to prevent a single point of failure.
The mission dictates the DR plan
More important than the nuts and bolts of disaster recovery, Eckes said, was aligning the plan with CTCA's mission. Eckes runs IT with a team of 84 people, who are hired as much or even more for their appreciation of the organization's mission to care for patients, than for technical skills, which "can be taught."
If an application goes down, that not only impacts the service offering at one hospital but across our four facilities.
Chad Eckes, CIO, Cancer Treatment Centers of America
Analyst Stephanie Balaouras, who covers disaster recovery and business continuity at Cambridge, Mass.-based Forrester Research Inc., said it's helpful for CIOs at any organization to step back and look at business impact before crafting a disaster recovery plan. "We in IT tend to focus on individual applications and lose sight of business processes," she said. In addition, the trend to consolidate remote office backup and recovery to a centralized model makes sense from both a technology and skills perspective.
Eckes said the business mission should always inform IT's DR strategy. "What that translates to, from an IT perspective, is the question I constantly ask my team: 'If your mother or father were being treated here, hooked up to medical equipment that is connected to our EHR, how redundant would you want this system?'"
Eckes agrees. "Quite honestly, we'll take that to the nth degree. That is what drove our goal, which is 100% system uptime," he said, acknowledging that many IT people would dismiss that as impossible. "But why would you target anything less? We'll keep on chasing the tail of redundancy until we achieve that standard."
Let us know what you think about the story; email Linda Tucci, Senior News Writer.