Mirroring, replication, snapshot and disk-to-disk (D2D) backup - what is the difference between these data protection methods and what are the similarities? More interestingly how come we have four methods for protecting disk data by writing it to disk? Isn't it excessive? Couldn't just one do?

The problem gets worse when we bring in fixed content alongside the transaction data and consider information lifecycle management (ILM) ideas. These could imply responsibility for the placement and movement of all data types.

Starting with the easy similarities the four methods above all involve copying disk files to disk files. Backup applications write the D2D backup files and generally use a virtual tape format to do so - or a tape library actually carries out the D2D operation while the backup application thinks it's writing data to a tape drive.

ADIC's PathLight VX library does this whilst Legato, Veritas, CA or other backup application suppliers' software happily think they are writing to an LTO2 tape or other format.

Replication is actually a term that includes snapshotting and mirroring. Mirroring and snapshotting can be carried out by a disk drive array or by a SAN management/virtualisation device. For example:-

- EMC's Replication Manager works on Symmetrix arrays and can replicate an array's contents to another array for disaster recovery purposes. It doesn't work on Centera fixed content arrays. Replication is a schedulable process.
- Veritas' Volume Manager provides host-based replication but the effect is the same as a disk-controller-based replication facility.
- EMC's SnapView makes copies of data in a CLARiiON array at specific points in time - snapshots - for testing or backup or decision support purposes. They can be full image copies or pointer-based ones. The latter are much smaller of course. A snapshot doesn't provide as full protection as mirroring because, in the interval between snapshots, data can be lost if there is a disk failure.
- EMC's MirrorView replicates all writes of data to one array or drive volume to another one. It's a constant process and the aim is to have another fully-operational volume or array should the first one fail.

Mirroring is a form of continuous data protection (CPS). Specialised CPS storage devices from suppliers such as Revivio and TimeSpring provide continuously updated copies of data across a network link.

CPS differs from mirroring in that data is time-stamped. As soon as corrupt data is detected the online data can be restored to a point in time immediately before the corruption and operations continue properly. Mirroring would just copy the corrupt data to the mirror drive.

Storage utility ideas
IBM, EMC, Network Appliance and StorageTek are each developing storage utility concepts that aim to be all-embracing and provide data protection for all types of data within the utility. This is a very large and ongoing task which will take several years to complete.

For example, IBM's Tivoli Storage Manager aims to provide a single data protection regime. There is a 16-page white paper describing the details of this. IBM's SAN Volume Controller and SAN File System are both parts of its overall Storage Tank concept. Another example is Network Appliance which states: "Network Appliance unified storage facilitates the deployment of a single, integrated storage solution."

The fixed content angle
Fixed content data filed under some kind of hash-based content-addressing scheme (EMC Celerra, ExaGrid, Permabit for example) requires specially-modified backup software or a D2D system (or a tape backup system) that understands its file formats.

But fixed content is a pretty general component of every enterprise's data storage and its data protection facilities should be the same as those for transactional content. That means data protection software has to understand its file formats.

The ILM angle
ILM thinking says that data has different values over time and should be stored on the most appropriate storage medium as its value changes. A typical scenario is that it starts out life on a fast-access disk (on-line), moves to a slower-access drive (near-line), then to a bulk low-cost drive array (SATA perhaps) and then to an archive medium - tape or optical disk - which is offline.

ILM software then has an operating assumption that it controls the placement and movement of data. But ILM software has grown up independent of backup applications. Only now are the ILM software developers realising that if tape is part of the ILM storage tiers than backup application technology is needed. Only now are the backup application developers realising that they are no longer solely in the backup to tape business; instead they are in the data protection business which puts them, willingly or unwillingly, in the ILM business space.

An ideal ILM data protection system combines the technologies and advantages of backup mirroring, snapshots, CPS and D2d to provide a single unified facility that can cope with both transactional and fixed content across the storage tiers from on-line to off-line. It is a very tall order.

Who can deliver a storage utility?
If you believe that such a scheme is viable then the probability is that only EMC, H-P, IBM, Network Appliance and StorageTek and, perhaps Softek and Veritas can develop it. They all have both hardware and software expertise, except Softek and Veritas. The full scope of a storage utility has not been defined yet, and within that the capabilities of a full data protection sub-system haven't been defined either.

We can expect to see over the next year or so more development of storage utility ideas by the five HW+SW suppliers above with, probably, Veritas and Softek contributing their own schemes, and pushing the point that independent storage vendors can better provide a heterogeneous storage utility than any storage HW-selling vendors who might want to lock you in. EMC, H-P, IBM, Network Appliance and StorageTek are of course all suppporters of open storage standards such as SMI-S and will support heterogeneity.

Until we have a clear idea of what we want a storage utility to do, and within that what a unified data protection facility can do, all we do at present is to watch and wait and clarify our ideas. Storage utilities are coming and if you can influence your suppliers then get talking.