Virtualization gives CIOs great flexibility in designing new disaster recovery solutions. Of course, some rules...
of thumb are fundamental: "If you have it in production, you need to put it on your hot site," said Edward Haletky, CEO of The Virtualization Practice LLC, a global analyst firm based in Wrentham, Mass.
Generally, hot sites are located hundreds of miles away from the primary site, in a spot where it's unlikely an earthquake or another natural disaster would hit both. Beyond that, CIOs need to address a couple of common concerns, experts said: ensuring adequate bandwidth for failovers and restoration, and testing the virtual disaster-recovery environment regularly.
The bandwidth bottleneck
"The No. 1 criterion for virtualization is bandwidth," said Jon Nam, director of technology at Macy's Merchandising Group (MMG) in New York, which is consolidating disaster recovery solutions at two primary sites -- in New York and Cincinnati -- that will be hot sites for each other. "Bandwidth is a huge issue because when you virtualize, everything is coming out of that hub," he said.
Depending on your update workload, bandwidth can be a sizable constraint, said Ray Lucchesi, president of Silverton Consulting Inc. in Broomfield, Colo. A bandwidth of 32 GBps might be okay for a single application. If lots of applications need to be replicated, however, CIOs might want to look at services that compress or deduplicate (dedupe) files to remove unnecessarily duplicated information, he said. That way, less data is sent over the wire.
However, "not all data dedupes," according to Greg Schulz, founder and senior advisor to The Server and StorageIO Group, an IT consultancy in Stillwater, Minn. "You have to have a recurring, repeating pattern of data -- dedupe is optimized for text," he said. "Video, audio, and [podcast] interviews generally don't dedupe well." Compression, the old standby, isn't as glamorous as deduplication, given its 2:1 ratio (compared to dedupe's 10:1 ratio); but it works with video and audio files, and at least ensures that everything gets backed up.
Some enterprises back up their operations to a cloud-based provider, which eliminates the need for internal disaster recovery solutions but still requires bandwidth, according to MMG's Nam. "We use two or three different ISPs, backlogged and failed over," he said. With overseas operations, Macy's is technically running 24/7. "The problem again is bandwidth: What time of day do you run these huge backups?"
Testing virtual disaster recovery solutions
If running backups in a 24-hour environment is a challenge, imagine finding the time to run the crucial test: a failover to disaster recovery solutions that are located in the cloud or at a hot site. "If you've got the equipment, you want to be able to test the [virtual DR] environment," Lucchesi said. "I'm familiar with one customer who does it once a week, but most sites do it on a quarterly basis."
Bandwidth is a huge issue because when you virtualize, everything is coming out of that hub.
Jon Nam, director of technology, Macy's Merchandising Group
Testing DR in a virtual environment can be a frightening endeavor, The Visualization Practice's Haletky said. "Most people do not have a big red button that says turn on the hot site," he said. "But when it comes to replication, you need to test it. If the app runs, you know you're golden."
New packages are making the testing process more automatic. Veeam Software Corp., a Columbus, Ohio, provider of backup software for VMware Inc., included a feature called SureBackup in the new version of its Backup and Replication software. Cupertino, Calif.-based Symantec Corp. has a replication offering for Microsoft SQL and Exchange applications, experts said, while Vizioncore, a Quest Software Inc. company, offers a replication solution for smaller businesses.
Still, disaster recovery solutions may work well in theory, MMG's Nam said, but the reality is occasionally putting extra loads on the system that you can't really test for -- design meetings and product launches, for example.
"You can test for 20 users and scale to 100 users or 1,000 users of the same type," Nam said, but in his experience, "the software that predicts the loads are not that accurate." There could be a bottleneck within a switch, which could take a lot of time to find, he said, "like a needle in the haystack."
No wonder CIOs don't like pushing that big red button -- and yet it needs to be done.
Let us know what you think about the story; email Laura Smith, Features Writer.