Document originality and integrity can be an absolutely vital element in legal proceedings and compliance situations. Cast doubts on a document's authenticity and millions of pounds could be lost.

There is a concern in some quarters of the storage industry that sub-file-level de-duplication, because it necessarily alters the original representation of a file, compromises a stored document's authenticity and so renders it un-usable or less usable in a court of law.

De-duplication equivalent to electronic tampering

Say you need to prove that an electronically-stored file or object is the original document. If it is an electronically-generated document in the first place, say a Word file or an Outlook Express email, then storing it in its native format is straightforward and if you store it on write-once, read-many (WORM) media then you have a pretty rock solid case that your stored file is the same as the original file.

Now say you de-dupe the stored file. It is altered as sections of it are replaced by pointers to byte strings stored elsewhere. Yes, it can be reconstructed but you can no longer say that it is the original document or email. It isn't. It is a representation of it in a different format.

Here is Gary Watson, Nexsan's CTO, talking about the subject: "Assureon stores files in a very straightforward XML format which could be easily understood in a court proceeding (e.g. during forensic cross-examination), whereas as far as I know the sub-file systems physically store files as recursive lists of pointers to blocks (or something even more complex) which would be challenging to explain to a judge or jury. It’s a layer of potential risk we want to avoid."

Nexsan isn't offering sub-file-level de-dupe with its Assureon product.

Andy Hale, the technical manager at storage integrator B2net thinks differently: “There is no reason why a sub-file-level de-duplicated document or mail file can not be presented to a court of law for compliance as long as the contents can be proven to be unaltered. All disk arrays store things in different ways using different block sizes and file systems, the fact it is de-duped should not alter the validity of this evidence in court.”

"There are products out there that can offer this type of compliance, enabling organisations to demonstrate that files have not been tampered with - by showing file and access history, which can then be used in a court of law. Software products such as Symantec Enterprise Vault will allow administrators to track versions of documents and allow legal searches to be done. "

"There are storage products on the market that achieve this level of protection, such as EMC Centera and NetApp A-SIS (de-duplication technology) and SnapLock (compliance/worm technology) that work at the storage level and meet current US legislation requirements. These latter two provide a disk-based worm device that guarantees that data written to it can not be tampered with."

Also David Ebsworth, Technology Director of oncore IT, an IT support company that provides Asigra's software as part of its managed service solutions, said: "The simple answer is yes, a de-duplicated file can be presented as an unaltered original document or mail in a court of law in a compliance situation."

"The whole reason for de-duplicating files is so that they take up less space in storage. The reason isn't to change the file and the technology used in de-duplication doesn't and can't change the file content. When a file is taken out of storage - whether it is primary or secondary (as in the case of Asigra's backup software) storage - it is automatically reconstructed back to the original document structure, with the same date on it as when the original document was last modified and the same digital signature."

" You wouldn't deploy data de-duplication if the files you restore are different to the original file. Storing and recovering de-duplicated files does not change the original file."

Others too think that quite long-standing data storage details, such as RAID, stop sub-file de-dupe being regarded as a special case.