A wave of sub-file-level de-duplication is washing over secondary storage and virtual tape library (VTL) manufacturers. Both Fujitsu Siemens Computers and Overland Storage have added de-dupe capabilities to their product lines.

Fujitsu Siemens Computers

FSC has turned to EMC, with whom it has a solid partnership, for its Avamar de-duplication product. It is supplying (OEM'ing) Avamar as a software product with a recommended and certified PRIMERGY TX300 or RX300 server configuration.

The aim, according to Helmut Muhleis, an FSC principal consultant, is to provide an effective and capacious disk-based backup facility for virtual servers and for remote and branch offices. It's necessary to carefully balance the host server's memory, CPU and other characteristics to provide effective de-duping performance.

Backup data is retrieved and de-duplicated by Avamar agents installed on client systems - application and file servers - and then sent over a network to a central Avamar system. The backup data is minimised in size at the source before being sent over the LAN or WAN.

For backing up VMware virtual servers the software can be used to de-dupe within and across virtual machines.

Why choose Avamar? Muhleis said: "We carried out in-depth testing of the de-dupe players. We need scalability as good as CentricStor's. The de-duplication algorithm needs to slice up incoming data to avoid overflowing the cache and so needing disk accesses which slows things down. We couldn't compromise by introducing a limited de-duplication capability."

An aspect of de-duping that FSC has identified is that certain files must not be de-duped, said Muhleis: "Documents that are contracts with signatures on them may need to kept inviolate; they cannot be altered. For these files de-duplication must be capable of being turned off."

In other words there must be a policy or sysadm capability to switch de-duping on and off depending upon file type and/or contents.

FSC is not announcing de-duplication for its CentricStor VTL.

The CentricStor VTL is the only VTL written from the ground up for both mainframe and open systems use. More than 500 have been bought - it's the leading VTL product in Europe - and look after more than 260PB of customer data. A major new version, 4.0, will be announced in early December.

Techworld's view is that 'if' FSC adds de-dupe to CentricStor this will be the only de-duping mainframe VTL in the industry.