As a newly minted IT director in 2011 at Iowa Select Farms, a pork production company with more than 600 member farms, Carl Vogel was met with antiquated systems and processes. The IT infrastructure needed a significant redesign, but the worst offender was the disaster recovery (DR) architecture. The organization's DR strategy was woefully behind the times, with network crashes blocking access to data for days.
In this podcast with SearchCIO-Midmarket Editorial Director Christina Torode, Vogel offers a step-by-step look at his DR overhaul, why he avoided the cloud and how his DR strategy led to data recovery, replication and backup optimization.
When you joined Iowa Select Farms, what made you realize that the DR strategy was a priority?
Carl Vogel: The hardware that was running our backup system was old and inadequate. I was happy to see they were not using tape backups for recovery anymore, and they were using a disk backup system. However, it was antiquated and needed a drastic overhaul and redundancy built in. No. 2, the snapshots were being taking once every night, and they often failed. I wanted a tighter timeline as far as when backups were being taken, so that we wouldn't have to lose a whole day's worth of data if some recovery of major significance happened. No. 3, we had a mix of standalone servers and VMs [virtual machines] in the environment, and the current backup solution did not serve both well. We could back up standalone servers well; however, we could not restore to bare metal. And VMs could do file backups to them, but the backup solution could not recover a VM in its entire state. No. 4, major overhauls were going to be taking place in our environment over the next three years. I needed a good backup solution that I could rely on for recovering at tighter intervals in case one of our overhauls didn't go as planned, or we missed something and had to go back.
Having data in our own cloud allows for a massive amount of flexibility; it allowed us to control that environment and made it very readily available to us.
Can you talk a little bit about some of the other things that you were doing as part of the revamp of the IT infrastructure?
Vogel: We had a big mix of standalone VMs, and the big thing was a lot of the hardware -- even the VMs -- were on old hardware and needed to be revamped, or they were on very old OSes, or we had databases that were on very old versions that needed to be upgraded. Those can be very tedious processes. The one thing to always to be mindful of when you make major upgrades, especially for as old as some of our systems were, is that the biggest piece is data. We needed to always make sure we could recover that data, or get to that data during the upgrade in case something went wrong. So when we are changing our hardware and changing and upgrading software types or database types -- especially when you are skipping several generations to get up to the latest and greatest -- that backend system on a DR solution is critical.
What combination of tools and best practices did you introduce to address disaster recovery?
Vogel: We introduced the idea of bare-metal restores to hardware. Before, you'd have to have a backup, and then you would take a piece of hardware, load an OS on it, build that up, and then move the files back over and install the software back onto it. That is a very slow process, not always very reliable, and more than likely something will get missed. By being able to do a bare-metal restore, you capture that snapshot, and you actually throw that whole snapshot back onto the hardware server. With that, you back the same environment, the same OS you had before, with everything else intact. You can recover in a couple of hours rather than two, three, four days, maybe even a week depending on what kind of failure occurred. Same with the virtual machine restore. If you can take a snapshot of a standalone server, create a VM out of it and push it to our VM host, you can recover extremely fast -- definitely less than a day, and maybe even less than an hour depending on how big of a VM image it is. Also, shorter backup windows was the best practice that I wanted to introduce here. If someone was working on a file at 8 a.m., and they forgot to save it, and at 9 a.m. they accidentally deleted it, no big problem because we're doing backups every five minutes.
Another huge tool that we're doing is off-site backups and replication. We have a data center that is 75 miles away. So we are doing backups here and replicating to that data center and vice versa. That's what gives us great redundancy: virtual machines and standalone Exchange, SQL and workstation backups, all in one solution. That's another big factor that we're introducing: It doesn't matter what we have here as far as our infrastructure is concerned. Our new disaster recovery solution takes care of all the different types of platforms we deal with. And we have multiple ways of restoring them and recovering data.
You own your own data center. Is that why you didn't go with a cloud DR approach?
More on DR strategy
CIO Briefing: BC and DR strategy for IT executives
Bank CIO virtualizes his DR site
Vogel: We considered a cloud approach, but because we did have our own data center that was geographically extremely separate from our corporate offices and any other entity, I didn't see the need to go that far. And the disaster recovery solution that we chose allowed for great flexibility there. So essentially, it isn't a cloud but it's in our own private cloud at the data center. I just didn't see the need to go any further than that, especially when we want our own data under our security as much as possible, and as readily available as possible. Having data in our own cloud allows for a massive amount of flexibility; it allowed us to control that environment and made it very readily available to us.
How would you say that the new DR strategy created business value for your company and for your members?
Vogel: I would say the biggest thing that the users see is that it has created a rapid recovery of files, emails, databases, standalone servers [and] VMs. That recovery could have taken 24 hours or more. And we were definitely going to lose zero-day data, meaning data that happened that day. If we had a massive problem, we would definitely lose it because our closest backup was the night before. Now, we don't lose that data anymore. We have two good sets of backup data and it's readily available, so it allows us to have great data integrity. If something bad does happen, we can easily get to it; it's highly recoverable.
This was first published in August 2013