ILM is being pushed hard by Sun and EMC and others. Stands to reason, you think. Put data in the right storage layer so that you don't waste money storing slow data on fast disk - meaning that data no one needs to acces very much has no business being on your front line disk.
Phil Tee is CTO and co-founder of nJini, an information asset management company, one like Kazeon or Arkivio. He says: "Personally I think ILM is bunkum." His CEO, David Jones, like Tee, thinks that there is too much complexity and intelligence in the storage layer of IT: "Get more intelligence in the infrastructure and you don't need it in the storage."
They say it's the application of the networking model to storage. Both of them have networking backgrounds - Cisco, Micromuse, Riversoft. We might view ILM as basically data movement, which is mundane, dull. Yes, it's difficult but that's because there are so many proprietary interfaces and nothing like networking's SNMP exists in storage.
ILM and structured data is relatively easy. ILM and unstructured data is not. Jones says suppose you get a 10MB PowerPoint file and then copy it to eight other people. Now you have eight copies of the file. And they all get backed up if they are held on servers. The point is you are wasting disk space. He and Tee say data - information, structured or unstructured - is an asset and should be treated like other assets. You should know how much of each asset you have and store in the place where it is needed.
Storing it based on a file type doesn't get rid of duplication. You don't need to de-dupe structured information because casual duplication doesn't exist in the first place. But casual duplication of unstructured information is endemic. You might even say there is a pandemic of duplicated PowerPoints, Word docs and spreadsheets and PDFs being spread by the e-mail attachment method.
So, Tee and Jones say, treat files as objects and create a checksum or hash based on their contents. It's like applying content-addressable storage methodology to all unstructured data. Whenever a new file is stored by a user, create its hash and compare it to existing ones. If you find a match then the user thinks they have stored a file. What they actually have is a pointer to an existing file and hey presto, you have saved megabytes of disk space.
Apply that to a large IT set up and you can be saving gigabytes of disk space, potentially terabytes, merely be de-duping files
This is done in real time by the nJini software engine which runs on an in-band appliance logically sitting between servers and storage. It has policies so that files of different content, containing different key-words, created by different users in different groups can be stored appropriately. The meta data it creates and uses for each filer is much richer than that used by bare-bones ILM software products.
What looks like a natural fit to me is to take nJini's software and layer it on top of EMC's InVista or INB's SVC. Use nJini to add intelligent file metadata functions and policies to turn your storage virtualisation and management stack into a rich ILM system. Pretty 'ngenious.