Backups are probably the most essential component of any corporate network, but are also one of those things that are done wrong with dreadful regularity. Yet it’s not difficult to get your backup strategy right – all it takes is some thought and a dollop of ongoing motivation.
1. The backup strategy
The first thing you need to do is actually dump the data from the live systems to your backup media. Nine times out of ten this means copying files to tape, though some organisations have other media such as CD-R or DVD-R instead of (or alongside) magnetic tapes. We’ll talk about tapes in this feature, but what we’re saying will generally apply if you’re a DVD-R or CD-R person too.
There are two main variables with backups: capacity and time.
If you’re using a tape backup system, you’ll have a finite amount of space on the tape. If you have an 80GB DLT drive and you want to back up less than 80GB per day, you’re sorted; if data exceeds capacity, though, you need to consider how to deal with this. There are numerous alternatives, the most common being: • Do complete dumps of some systems and “incremental” dumps (which only store stuff that’s changed since the last full dump) of others each day. • Employ someone to switch tapes when one fills up. • Use an auto-changer that can switch tapes itself in the middle of the night.
Incremental dumps are a compromise, and you must always consider one fact: if you do incremental dumps, you’ll have to use more than one tape to do a complete restore of a system. There are two approaches to incremental dumps: • Back up all data added or changed since the last dump (incremental or full). Don’t go here, though, because if you have a full dump and six incrementals, you’ll need seven tapes to do a restore. • Backup all data added or changed since the last full dump. This is the one to choose, as you’ll need at most two tapes to restore any one system.
With regard to time, there are two options: the backup doesn’t take too long and can be run out of hours; or it takes ages and at least part of it has to run during office hours.
If the backup can run out-of-hours, you don’t have to worry too much about the fact that your network performance will suffer with all this dump data being flung over it. If, on the other hand, you have to combine backups with production data on the network, you should seriously consider installing a separate LAN, connecting to an extra LAN card in each backed-up system, to carry the traffic away from the production data. You might even call it a BAN (“Backup Area Network”) – I wonder if we just invented that or whether someone’s already christened it.
Note that you might need a BAN even if you work 9-5 – notably if you have (say) live Web servers or extranet systems being backed up: just because you’re not there to see the LAN slow down doesn’t mean you don’t have to consider it. In these circumstances you’ll also want to talk to your software dealer about the pricing for the “live backup” options for your mail or DBMS software, so you don’t have to interrupt the service to back up the data.
The other time consideration is, of course, where the total data transfer time exceeds the interval between backups. If your daily backup takes 30 hours to run, you have a problem and you need to either add storage capacity (so you can dump from multiple sources to multiple tapes at once) or consider dumping less data each day.
2. Verification and integrity
Always, always, verify your storage media. If time permits, get the backup software to verify it when it’s written to the tape, but if not then at the very least implement a process of checking every nth backup to make sure the data really is there. You really, really don’t want to find when your server dies next week that the latest valid backup you have is from March 2001.
Make sure also that you consider how you store your tapes. Although most data recoveries are, in my experience anyway, induced by people deleting the wrong file, the actual reason you’re bothering with this backup lark is so you’re ready when the building burns down, floods or is burgled. At the very least buy a firesafe that’s rated for tapes (many aren’t – check or you’re stuffed) for a sensible time, and if you have multiple buildings or sites then instigate a process of moving tapes to remote locations, working on the principle that two of your offices are unlikely to burn down together. This said, of course, if you’re in an industry that’s susceptible to terrorism or other potential deliberate damage, look at other secure off-site locations for storing your backups.
Your offsite storage could be a steel-walled bunker in a secret location, or it could be your network manager’s home office. Either way, though, there’s an important factor to consider …
3. The recovery process
If the effluent hits the fan, you need to be able to recover as quickly as possible. This means not only having to get to the backup tapes in a rush (so consider what happens if the guy who took last week’s tape home goes shopping when you least expect him to) but also having the facilities available for recovering the data. This means having boot disks/CDs for all of your key systems that has all the right operating system components and network/tape drivers to get the thing up and can pull the contents back from the backup tapes. This is the bit that people forget, and it’s the difference between getting back up in three hours and getting back up in two days.
4. Policy and process
The final key component of the backup strategy is the policy and process behind it. Have a book that the guy who changes the tape signs. Have a defined list of what happens to each tape when it is removed from the system, and how tapes circulate around the local and offsite storage locations. Aim to have a standard hardware platform so all your servers are as similar as possible (it means you have one boot CD instead of six to keep up to date). Get yourself a spare server so you can actually try out your recovery process from time to time. And make sure everything’s documented in gory detail so that even the tea lady could do a restore – which means not just writing the “how to” process but also keeping a cache of passwords in the MD’s safe (and documenting the fact that this is where they are) in case you’re by the pool in Tenerife when the building blows up.
Backup strategies are neglected because they’re very easy to do. There’s always the implicit assumption that because the IT guys set the kit up in the first place, they know how to recover from a disaster. Which they probably do. Trouble is, they may not know where to find the right passwords at restore time, or perhaps the new email server is a different make and the boot CD doesn’t work, or perhaps nobody actually bothered to clean the tape drive when the light started flashing because they didn’t know what it meant.
Write it down, test it often, and follow it. You’ll thank yourself.