Archiving used to mean sending data to tape where it eventually died; offline tape archives were the retirement home for data. The advent of compliance regimes has altered that as data must be found reasonably quickly. The repository of compliance data must be kept in an accessible way. The default is to use second tier disk storage.
But these burgeoning data stores are also holding semi - and unstructured data - documents, presentations, PDFs, e-mails, etc. - which must be kept for compliance reasons is putting increasing pressure on data centre space and electricity requirements. What can you do?
Bridgehead CEO Tony Cotterill has a solution; archiving, but not to offline tape. Techworld met up with him and discussed the topic.
Techworld: What is happening with archiving?
Tony Cotterill: We're seeing a greening of archives. People are doing the green thing. The motivation is a reduction of cost, putting data where it is least expensive to store. Archiving was seen as 'put it somewhere where where we can't get at it. Compliance says, though, that you can't store it and forget it. Compliance says I must have immediate access if I need it.
The green thing is in opposition to this, they're opposed to one another.
Techworld: Are these things really disparate?
Tony Cotterill: They are not disparate; they are not opposites. Nearline still plays a role in archiving. The meaning of nearline has been hijacked to mean online arrays, not offline ones.
Techworld: So nearline storage can mean online storage with offline and removable media: tape libraries; optical juke boxes; Copan's MAID and Nexsan's AutoMAID?
Tony Cotterill: Yes. We're enamoured with optical media at Bridgehead. In the archive you need random access, write-once-read-many (WORM) and offline media. These are the three things you need for an archive and compliance environment.
Techworld: How is archiving developing?
Tony Cotterill: I think the world is re-discovering archiving. IDC says its an $800 million market now, and growing to $2.3 billion by 2011. The market has been led by an e-mail archiving boom. But archiving as we're now doing it is point archiving - e-mail - and that will not be viable in 2011. You have to do it all.
Techworld comment Cotterill describes a generic archive needing to take in data form many sources: e-mail; database; SharePoint; documents; presentations; etc. from anywhere on a corporate network. An Archive software layer deals with it and puts it in an appropriate place on a spectrum of storage media ranging from hard drives through optical to tape,based on access urgency and security needs: "Potential destination media have all got their own features. No one is inherently bad."
The archive software layer catalogues incoming data, including content details, and is self-protectng and self-purging. It facilitates search, by content for example.
Techworld: Could you describe the purpose of data?
Tony Cotterill: Data is for corporate governance, not just for compliance and regulation. Compliance is just one tail on the dog for which I need e-discovery. For governance I need e-knowledge.
Techworld: (E-knowledge-type software meaning products such as those from Autonomy.) What about search?
Tony Cotterill: The archive facility needs more than just a simple Google search. It needs a more workflow-oriented and production-class system to find the desired data. It's about e-knowledge. That's not our game. We like to provide data for an e-knowledge system to access.
Techworld: What is your view on green IT?
Tony Cotterill: Green for me is far more about the software than he hardware vendors. It's also far more about cost-savings. Just because you're saving money though doesn't make you green. It's mercenary. It's as if people are saying you have to give a better quality of access to my data. I'm not going to go green if I suffer by getting less than before, in terms of access to data.
Techworld: The vision that we are left with is of an archive facility that stores data which needs to be accessed fairly quickly for compliance and similar reasons on offline random-access media in an online device; that is Bridgehead's idea of nearline storage. The archive software layer is policy-driven to distribute multiple types of incoming data across a range-of potential devices and catalogues it in metadata and content indices used by e-knowledge and e-discovry software. Because the media in an archive store is offline the green electricity use reduction goal is met. Because the device itself is online and offline, random access media can be automatically fetched and slotted into drives, the compliance 'need to access reasonably quickly' goal is also met.