The idea of disk-to-disk backup is pretty well established, especially now that companies such as EMC, HP, Quantum and StorageTek have given it their stamps of approval.

The theory is simple: even cheap ATA disk is much faster to access than tape, so if tape is too slow for your available backup window, do the first copy to disk and then write it to tape in the background once the application server is back online.

It is a similar idea to taking a snapshot, or mirror, of a volume, then breaking off the mirror and backing that up, except that disk-to-disk uses separate cheap disks whereas mirroring tends to need twice as much Symmetrix or what-have-you.

And it also means that if one of your users deletes a file by mistake, there's a good chance it can be recovered from the latest backup stored on fast disk, instead of needing to load and mount a physical tape volume.

Even so, there are at least several possible approaches to disk-to-disk backup. The main one is software that emulates a physical tape drive, this means you can retain your current backup application and processes, as the software simply sees a tape drive.

This requires disk storage with a separate controller of its own to run the emulation software. The first major product offering of this type for midrange (DLT and LTO) superdrives was Quantum's DX30, which is now into its second generation and has acquired a big sibling in the 64TB DX100.

Increasing choice
Of course there are other tape-emulating products around, such as the ADIC Pathlight and EMC's Clariion Disk Library, which is based on software from FalconStor called Virtual Tape Library (VTL). HP has approved the FalconStor software too, for use with various StorageWorks arrays, while StorageTek has its Virtual Storage Manager and the SN6000 appliance.

However, most backup applications also now have the ability to write tape volumes directly to a disk array. So why would you use a relatively expensive Pathlight or VTL instead of doing backup straight to disk? After all, that still allows you to stage your data to a cheap SerialATA array and then put it on real tape once the timing is less critical.

We went into Quantum's laboratories to find out. On one side was a DX30 and on the other was an off-the-shelf SerialATA array from Infortrend - chosen because this is the same array that Quantum builds into the DX30, so both backups would be using the same disk hardware. The two were connected to a Windows 2000 server running Veritas Netbackup 5.0 and set the task of backing data on a network drive.

Tape emulation is twice as fast
The results were clear - backing up to the DX30 was at least twice as fast as backing up to the disk array. The actual speed was hard to judge because the metrics available all measure different things, for example where Netbackup recorded 11MB/sec for disk-to-disk, the verification overhead meant that the Fibre Channel switch port was actually seeing 24MB/sec of traffic. The equivalent speeds to the DX30 were 33MB/sec and 40MB/sec.

Quantum's software engineering manager Joe Ewaskowitz attributes the difference to a range of factors, including overheads within the Windows file system and perhaps disk fragmentation too. However, he says the biggest thing is probably the different block sizes used when writing to disk and to tape.

"To Netbackup the DX30 is tape, so Windows sends larger block sizes because it is not treated as a random-access device," he says. He adds that on the disk-to-disk side, his own testing has shown "no appreciable performance difference between RAID 5 or RAID 1+0 striping."

He warns that you may need to look carefully at the bandwidth available in a tape emulation device. "The DX30 has two fibre ports for customer connection and four for disk connection, one per 16-drive array," he says. "We use SATA drives and you're more likely to hit the SATA limit than the Fibre Channel limit."

Architectural factors
Jeff Corbett, Quantum's director of worldwide system engineers, adds that other aspects of the virtual tape library's internal architecture are relevant too. For example, the DX30 has its own internal Linux-based controller and uses switched drive arrays, while EMC's VTL uses Fibre Channel-Arbitrated Loop internally and an external Windows server as the controller.

Corbett points out that arbitrated loops can give poorer performance than switched schemes because looped drives much share bandwidth. He warns too that you need to look at whether the data compression is done in hardware or software, as this too affects performance.

And he adds that you need to consider how each approach manages the migration from disk to tape. The DX30 goes into the backup path as a device in its own right - it looks to the software like a tape library, but a separate library from the one that has real tape cartridges in it. That means the staging process from virtual tape to physical must still be managed, but you do end up with a 'real' physical tape volume.

VTL manages the staging to tape itself, which simplifies backup management. However, it means that restores must also go through the VTL and cannot be done direct from tape, which means your disaster recovery site will also need a VTL.

Pathlight too does the data movement internally; indeed, one way of looking at the Pathlight is as a disk cache fronting the tape library.

In many ways, all these developments represent the industry trying to even out the performance improvements that are taking place in different areas. For example, a tape drive can slow down dramatically if it is not fed data fast enough, because it keeps having to stop and briefly rewind while it waits for more data to arrive, a process called backhitching.

"Superdrives have brought back the need to plan your server and allocate enough memory and so on," says Corbett. "The drives are so fast now it's hard for a server with JBOD to keep up."