Every major company, and a few startups as well, seems to be offering continuous data protection. That's to say, instead of backup tapes being partial and unreliable backup copies of your data you can recover every last byte of information, specified down to an exact time; 3.32pm on Thursday 1t October, 2005 for example, if you wish.

Why would you want to? Isn’t this just supplier waffle; mere marketing puff to sell products nobody really needs? Haven't people relied on tape back-up for long enough?

But, according to the continuous data protection (CDP) people, tape backup is unreliable and mirroring won’t answer the one need you have: to turn back time.

When you realise that your system, your server, your workstation, your drive array, has missing or corrupted data then you need to turn back time and have it revert to a state it was in before the data loss or corruption happened.

Tape unreliability
Tape backup unreliability is said to be famous. Tape backup is to data recovery what bus timetables are to real life; endearingly unreliable. It’s supposed to be because backups scheduled for Friday afternoon, POETS day, are not run because the scheduled person leaves early and fails to do it. Or they are not run because the job is started, people leave the office, and find the job uncompleted due to an error that happened while they were away.

Even when they are run data can’t be restored because the tape is dirty or broken, the drive fails or the tape can’t even be found.

D2D and CDP suppliers will push Gartner-type reports your way showing that what you thought of as a reliable tape-based backup system is actually quite the reverse.

Back it up to disk instead
The D2D vendors reckon they have the answer to unreliable tape backups. Backup to disk instead. Overland Storage and Quantum are just two of the suppliers at Storage Expo with D2D products. They’ll tell you that disk-based backup can be automated, that it is rather quicker than tape-based backup and that restores are magically quicker.

This is true. There is the slight problem that you’ll probably find 5 terabytes of tape media is cheaper than 5 terabytes of disk. For our purposes here though, the other problem is that if you backup your data to disk and that data is corrupted or incomplete then you don’t have the ability to restore your system back to a known good point without losing data.

What do I mean? Suppose you do a daily backup to disk at 6pm every day. Next day, at 5pm, your transaction database fails. You need to restore it from your backup. You can do that, but every transaction between 6pm last night and 5pm today is lost.

What the CDP people say is that you need ‘continuous’ data protection, not backups every week or every day.

Mirroring is not CDP
Fine. All right. Let’s send every data byte we write to a second set of disks. Let’s mirror the primary disk set to a secondary set. EMC, H-P, Hitachi Data Systems, IBM and Sun will be delighted to tell you about it and what their software and drive arrays can do for you.

Mirroring is thought of as a near-perfect way to avoid data loss but it has a couple of disadvantages. It’s expensive; you need a second disk array so that writes to the first array are simultaneously made to an identical second array. More importantly from a CDP point of view, you can’t wind the clock back and extract all data up to a previous point in time from a mirror disk or array.

For that you need to add something to every piece of data you write. It has to be time-stamped. And to avoid the expense of a duplicated primary array you need to copy your base data and then copy only the changes to it. It’s like a synthetic backup, which Computer Associates, Symantec’s Veritas or EMC Legato can tell you about. You take a full back of the data you want to protect and then take incremental backups thereafter, saving only the changed bytes, the deltas in the trade’s jargon..

When you need to restore a file it is reconstructed, synthesised you might think, from the original full backup and all the subsequent incremental backups containing data relating to that file. This saves space in the backup media because you aren’t taking full backups of everything each time a backup is run. It saves you backup media space and it saves you backup window time.

CDP is like synthetic backup to disk of all changes
So why won’t synthetic backup to disk do? The answer is that it will, but only if every change is written to the backup disk. Traditional synthetic backups are run, like all tape backup jobs, at intervals, typically of a day or a week.

The CDP suppliers run synthetic backup, aka their software processes, all the time. Talk to EMC, HP, IBM and Network Appliance about their plans for offering this technology.

It was first established by niche technology startups like Revivio and TimeSpring. Now the storage majors are offering it or planning to offer it.

Because the CDP backup data is not mirror data it doesn’t have to go to an identical drive array. It can go to a serial ATA (SATA) array. You could backup data on a set of SCSI drives or a Fibre Channel array, to a cheaper SATA drive array. Having RAID facilities protects against a SATA drive failure. You can backup the SATA array to tape for archive-level protection and off-site vaulting for disaster recovery.

This might be overkill, be excessive protection for many people who just want something more reliable than tape backup and with the advantages of D2D but with better protection that daily synthetic backups to disk.

Microsoft has just announced its Data Protection manager (DPM) which enables you to take snapshots every hour of your primary array and store them on a separate storage server. Microsoft, Quantum, Dell, and H-P will gladly tell you about this at the show. Symantec will tell you about its Veritas product that can match this, possibly exceed it.

With near-CDP you can turn the clock back to within an hour of the time you want. Much better than weekly or daily tape backup. With true CDP you can turn the clock back to the minute, the second even, that you need.

Talk to the vendors mentioned, talk to storage system houses, and understand the costs involved, and what it means for your daily routine in your IT shop. Identify data sets that might need to be continuously protected. And then, if you do need to turn back time, you might be able to do just that with your data.