Battling armies of cloned files that bog down enterprise storage operations, new data de-duplication techniques rid systems of extraneous versions of the same information -- a powerful promise that is causing a stir among enterprise IT buyers.

Deployed by backup clients or software agents onto servers, desktops or laptops, data de-duplication features make use of algorithms or object-oriented processes to home in on redundant data segments.

Data deduplication and you

Before you decide if this storage management technology is right for your enterprise.

Ask yourself:
- Is it important to de-duplicate data immediately or could the process be completed at a later time?
- Does my organization want to pursue a disk-to-disk strategy or other means of augmenting tape?
- Do I have back-end capacity constraints or am I lacking enough disk to house all of my data?

Ask your vendors
- Where does de-duplication take place: at the client before sending the data, at the disk device after the data is sent or on the virtual tape library as a process?
- Can your solution handle all of my backup streams?
- Does your system support standards such as the IETF's Session Initiation Protocol?
- If de-duplication will be performed at the client side, can your product ingest data fast enough to keep up with high-transaction systems?
- What methods are used to identify duplicate information and how does the product guarantee it will not falsely label the data?

Repeated data copies that bloat total storage volume 10 to 20 times more than necessary are stomped out, freeing up gobs of extra storage space.

Data de-duplication's staying power seems virtually guaranteed, because storage space is at an all-time premium.

Lawyers, government regulators and corporate leaders are breathing down the necks of IT managers, who are loath to scrap any information for fear the move will haunt them in a future lawsuit or audit.

Shrinking the size of the stored data volumes seems to be one of the few options left.

"There really is a lot of buzz around deduplication. At the same time, it is a technology that is here to stay, because the benefits are so powerful," says David Russell, Gartner vice president, storage and strategies.

Data de-duping for Sarb-Ox

Data de-duplication proved more than a buzzword for Vaalco Energy, a Houston company that harvests and processes crude oil and natural gas. "The technology satisfied a real-world need for us," says Dereck Stubbs, Vaalco IT specialist.

Vaalco's very real need for data deduplication centered on a Sarbanes-Oxley Act financial audit that came barreling at the company last year. Vaalco had to prove quickly that its backup and recovery procedures met the statute's stringent requirements. It turned to Asigra, which packs data-de-duplication functionality into its Televaulting software.

"We needed a solution in days, since the audit was going to come up in a couple of weeks," recalls Robert Walston, Vaalco IT and purchasing supervisor. "The e-mail requirements were especially tricky, since we had retention requirements to hold e-mail 'X' amount of years. There is some duplication when you are going back that far," he says.

Although Vaalco initially hit on deduplication in its scramble to satisfy e-mail retention mandates tied to Sarb-Ox, the company quickly found greater benefits. It reduced data volumes to the point where formal off-site storage became unnecessary, and that gave the company peace of mind, Stubbs says.

De-duplication's role in enterprise efforts to avoid tape backup and off-site storage has many companies interested in the technology, says Heidi Biggar, an analyst at Enterprise Strategy Group. "If you free up more storage capacity, you could choose to keep data in-line. It is a powerful technology," she says.

Data de-dupe cracks the case

The power of de-duplication for law firm Winthrop & Weinstine was in the new storage avenues the technology afforded.

"By reducing backup data set volume as much as 20 times, deduplication makes disk-based backup cost effective and [opens] an entirely new set of options," says Craig Wilson, IS manager.

Using backup and recovery appliances from Data Domain, the Minneapolis law firm replicated data to remote sites. "Backup data, by its sheer size, is immobile. It can't be sent via secure WAN to remote sites for disaster recovery purposes," Wilson says. Along with disaster-recovery improvements, other savings materialized through the company's use of the Data Domain de-duplication features. For example, the firm reduced costs and liabilities associated with third-party handling of backup tapes.

Data de-duplication also solves many remote-office storage problems, says Curtis Damhof, a network manager at St. Peter's Hospital in Albany, N.Y., which makes use of data de-duplication features in Avamar Technologies' Axion software.

"We currently back up our remote sites to our main office due to the efficiency provided by the de-duplication technology. Another place we have been looking at using the product is in the backup of all our desktops and mobile users," Damhof says.

Other vendors offering data deduplication features in product sets include Diligent Technologies, Exagrid Systems, FalconStor Software and Sepaton. Larger vendors such as Network Appliance Inc. and Symantec Corp. also are jumping into the mix, proving that de-duplication has won a place in the storage market, Gartner's Russell adds. Pricing varies by vendor. For example, the Avamar software costs about $9,000 per terabyte, and Data Domain's appliance and gateways are priced from $19,000 to $105,000, he says.

"Interest in data de-duplication has really heated up this year, especially over the summer. There has been a bit of an educational process under way, but the technology is really reaching critical mass," Russell says. "It is no longer something on the fringe, since there are enough deployments for enterprise users to now have a higher level of confidence."