Content-addressed storage (CAS) is a neat idea. Break a file into chunks, uniquely identify each chunk with a hash code, and then store it. When storing future files see if the chunks in them are already stored. If they are then store a pointer to the stored chunk instead of duplicating it. Great idea. You de-dupe data and your disk utilisation rockets up. EMC's Centera started this idea as a mainstream storage technology and now it's becoming accepted as a good way to stored fixed-content, reference data.
There is a problem though; isn't there always? It stores data on disk and data to be stored is sky-rocketing and disk is more expensive than tape and restoration is slow and you store data on tape in backup containers and ... we know all this. You can't do CAS technology on tape because tape is a sequential access medium and you need random access for CAS to work.
In the last year or so MAID products - massive arrays of idle disks - have appeared. First there was trail-blazer Copan with its Revolution product and, second, Nexsan has adapted its SATAbeast product with AutoMAID, to offer the same idea - a densely packed array of drives where the inevitable cooling problems are side-stepped by having most of the disks idle at any one time. Data reads from such arrays are slower than fully online disks but still a lot faster than from a tape library because disk spin-up time is a lot faster than tape cartridge mount time and the stream time to the first data byte.
So MAID products are finding a role either as substitute tape libraries or as substantially large disk caches in front of tape libraries. Archiving software is being offered, quite recently by Copan with its Millennia Archive product.
That archive though will inevitably store lots of duplicated data.
Wouldn't it be good if you could increase array efficiency by de-duping the data? Why don't Copan and Nexsan employ CAS technology and offer it? They would have a unique advantage over other CAS products with their massive capacities.
It's probable that both Copan and Nexsan are content for now with the idea that you can have enormous disk capacities in a drive array at all. Customers are like Oliver Twist though. They always want more.
If I were a primary or secondary storage supplier with a SATA-based product line and access to MAID technology I'd surely be looking at the CAS archive market as a natural fit for the technology. Guess I'd better talk to some CAS software technology folks.