Archiving is experiencing a boom. Researcher IDC says it’s an $800 million market set to grow to $2.3 billion by 2011. But the numbers will prompt a raised eyebrow amongst those who don’t really understand the area. For them it seems archiving is still about consigning information you hope you’ll never need again to a ‘dusty vault’ somewhere. If you do need to get something back– you assume it’s going to be painful.

But this is a false notion, because today you can archive data to anything from online disk to near-line arrays and Write Once Read Many (WORM) optical to off-line, off-site tape. So archived data can be very accessible on online or near-line media if required. And the essence of archiving is about managing your data across the range of storage platforms and devices, enabling you to optimise costs, while at the same time addressing pressing compliance and security concerns. Moreover by moving data off spinning disk to ‘powered down’ media, archiving probably has a claim to be one of the original ‘green’ technologies.

Accessibility is actually boosted because archived data is catalogued, indexed and searchable by content. And if you replace data with a stub or placeholder on the primary store, you effectively make the archive process transparent to the end user.

Archive confusion

In fact the long suffering perception problem surrounding archiving is not just down to the outdated images conjured up by the word itself. But also because archiving terminology seems to have been turned and twisted by a variety of interested parties, confusing the market and hiding its real benefits.

Don’t scoff, but archiving is one of the technologies that underlines the Information Lifecycle Management (ILM) dream. The vision may have become a little tarnished and the market grown sceptical, but only because the ILM term seems to have been ‘hijacked’ by hardware vendors. For them it has come to mean a way of shifting data between different tiers of disk: “Our ILM solution will help you by migrating your data between our disk, our less expensive disk and our even less expensive disk. The bottom line is you need to store your data on these different layers of our disk.”

But while this type of tiered storage system certainly has a role to play, a true archiving definition of ILM would revolve around looking at the data first and foremost. How accessible does it need to be? How secure? Do you need to audit access? Does it need to be authenticated? It’s then about creating rules and using policy-based archive software to intelligently move your data between the full spectrum of storage media, including the many tiers of disk, in order to best address these data requirements.


Backup is not archiving

Another source of misunderstanding is the many organisations that persist in using backup for what they are calling archiving. They say ‘”we’ve taken a backup, we’ve put it in a vault and now we can use it as an archive”. But with a backup how do you know what’s on it and will you know in a year’s time? How can you quickly get to a file on it and how will you know it’s the correct version of what you want? Backup lacks the granularity that can offer answers to these questions, and to use it in a full audit or litigation scenario is a nightmare. Archive on the other hand is specifically designed to rise to these situations.

Confusion also arises around what vendors are calling archive appliances. These are typically one or two terabytes of RAID disk on a box with a server and an allocation of secondary storage – which could be removable disk, optical or even tape. The appliance migrates your data between the internal RAID and the secondary storage in the box, offering some of the attributes you might expect from an archive such as storing multiple copies, encryption, validation or a ‘pseudo’ WORM capability to address compliance needs.

But while these appliances are an excellent ‘receiver’ of data, that can play an important role in terms of where you eventually archive some of your data, what they don’t do - which is at the heart of true archiving - is help you to proactively manage your primary storage. They do not intelligently reach out across your network, select your data and automate its movement, taking it off the primary store and onto your chosen selection of secondary media and devices according to its requirements. Neither (for the most part) do they index the data, so search is still no better than on raw disk.

Single department archiving doesn't go far enough

The emergence of archive appliances and issues such as departmental compliance concerns, as well as the pain surrounding email growth have helped to drive archiving installations. But these factors have also resulted in archiving often being isolated to single departments – finance for example – or restricted to one type of data, usually email. This fails to go far enough, because archiving will really come into its own if IT takes control of it on an enterprise level so that you’re using a central system to archive from any data source – email, documents, SharePoint - regardless of location, platform, or device and distributing it across to “data-appropriate” destination media.


By happy coincidence the archiving process can make a real impact on the corporate carbon footprint. Many storage vendors have ‘hopped’ on the green bandwagon to promote this or that media as greener because it consumes less or no power, whether it is tape, removable disk, or MAID (multiple array of idle disks). But this is yet another source of confusion, because while having energy efficient media is great in itself, what organisations really need – that archiving gives them – is a way of automating the migration of the 80 percent of infrequently accessed data sitting on replicated RAID storage onto the various levels of energy-saving media.

The opposite of a dusty vault

The truth is that if you get archiving right, you are able to harness all your data much more effectively. A lot of data growth arises from unstructured or semi-structured files – documents, emails, presentations, PDFs. This information may be sitting on the central server, but it’s actually under the control of users NOT the organisation. The organisation may have no idea if somewhere sitting in a shared user file or an inbox there is a piece of information that could turn out to be a legal ‘smoking gun’ in future, or if there is a hidden piece of vital business intelligence.

Putting this data into an archive is all about good corporate governance, because in addition to establishing retention and deletion schedules, keeping it secure and protected, the data is catalogued and indexed in the process. At last you are able to sift through it using content searches - you can truly have the information at your finger tips and you are managing it in a sensible way. This is far cry from having data stuck in a dusty vault.

Tony Cotterill is the CEO of BridgeHead Software.