De-duping postal address lists to remove repeat instances of the same data is common practice. De-duping of network-attached storage is only just happening. And it's costly. EMC Centera plus software will cost you hundreds of thousands of pounds.
However, Deepfile has announced a software approach that doesn't use complex content-addressed storage algorithms. Instead Deepfile Sentinel runs on an Intel-based appliance and de-dupes files on NAS boxes shared by Windows, Linux and NetWare servers.
Duplicated files are found by Deepfile's Auditor software. Then Sentinel notifies the users by e-mail so that they can decide what to do. Jamie Gruener, a Yankee Group senior analyst, said: "Giving users the power to manage their own files, especially in light of complex compliance and data retention policies, is a growing need corporate customers are grappling with."
Dupes can be kept, deleted or moved to less-expensive storage or archived off-line using previously set-up policies. Deepfile's existing Enforcer product manages the creation and storage of these policies.
It is thought by Deepfile that this approach is an ideal one for un-structured data, such as Word documents, PDFs and PowerPoint presentations. It isn't so good a match for semi-structured data, such as e-mail, where speciality archivers, such as Windows Storage Server, are spreading fast. Structured data, as in databases, generally has its own facilities for record duplication detection.
If you already have NAS storage on which Windows, Unix/Linux and/or NetWare users store un-structured data then Deepfile Sentinel could be worth a look.
Arkivio offers its Auto-Xplor, Auto-View and Auto-Stor products to achieve the same ends. But users are not told which files are duplicated and need attention. They are told this by Deepfile's Sentinel. If your users need to be able to keep specific files, even in duplicate, then Sentinel is the product for you.
The appliance is a 1U-high rack mount box and costs $15,000, roughly £10,000. Customers pay $2,500 per terabyte managed - around £1,500 - and there is a $20 per user costs as well - about £12.
So let's say you have 10TB of storage on your NAS and 100 users. You are looking at $42,000 - over £26,000. It means that you need to have a minimum of £26,000-worth of disk occupied by duplicated un-structured data files. This is not cheap. Enterprises only need apply.