As 2007 approaches, many storage managers are working on next year's budget and one item that is sure to appear on the shopping lists of many is a new virtual tape library (VTL). Yet with VTLs rapidly maturing, a new feature called data de-duplication is one that users should seriously examine.
Data de-duplication significantly increases the amount of data that a VTL can store. Unlike data compression, which stores the same amount of data in a smaller space, data de-duplication identifies the same blocks of data from different backup streams and stores them as one.
VTL vendors that support data de-duplication report that data reductions of 20:1 or greater are possible. While not everyone will see results like this, de-duplication starts to give VTLs capacity-like features that you normally only find in tape libraries.
Yet performance overhead is a major downside associated with this technology. Data de-duplication analyzes blocks of data in the backup job to determine if they match existing blocks of data before storing a new block. However, executing this task during backups can slow backups to the point where they run as slow as tape backups.
To address this, some vendors offer a post-processing option. In this mode, data is backed up in its native format and only after the backup is complete does the VTL de-dupe the data. Though processing the data post-backup increases the VTL's disk capacity requirements, the performance overhead is moved to off peak hours.
Many storage managers are anxious to deploy VTLs and data de-duplication technology is critical if one hopes to eventually replace tape with disk. But with the overhead that data de-duplication introduces, managers should first verify that the VTL they want offers the options they need so their planned 2007 purchase does not turn into a pumpkin.
Jerome Wendt is the president and lead analyst with DCIG Inc. He may be reached at [email protected]