Enterprises are realising that they have to archive more and more information, from e-mails to PowerPoint slide decks as well as Word docs, e-mails and transactions. It used to be thought that you just had to store fixed or reference content to meet compliance needs. It now seems as if, increasingly, all content will have to be archived.
That means you have got to get the stuff into a bigger and faster-performing archive silo and, once it's in, find it whenever the compliance requests come in. We covered Archivas ARC v1.5 last June. We covered Archivas' use by Hitachi Data Systems last week or the one before. The new v1.8 product increases archive size and searchability compared to the v1.5 announcement last June.
The new version can store more data, of more types, and find it better. A cluster can have now have up to 80 nodes - it was 50. The total archive size is 2.5 petabytes and 2 billion user files. It can be searched by a simple keyword search, by a tick-the-box visual query builder or by an ML-based query language. Lastly, up to 370 file types supported.
The product has evolved to better meet needs and Archivas now has it sold by HDS and also CA.
It's looking to be an advance on EMC's Centera which will need a deep refresh of its operating system to match the Archivas functionality.
The Google effect
Enterprises and organisations, as well as people, want to go and look in one place for information. They want information entry into that place to be as easy as possible and information finding to be simple, straightforward, powerful, and fast.
This is why Google is becoming the world's logical information archive. It is because it finds information fast and it finds it better than other search engines, such as Alta Vista, by better identifying relevant information.
An organisation's digital archive is rapidly evolving. It used to be that place where transaction records, e-mails, documents and slide decks went to endure in suspended animation in a tape library or shelves of tape cartridges. Images of specialised sorts, such as medical ones, were often stored in a separate silo, an optical jukebox for example.
Then along comes compliance. Suddenly, it seems, that archives have to come on-line, more so than tape libraries probably, much more so than shelves of tape cartridges or DVDs, and be searchable. How on earth do you search for keywords inside all the tape cartridges inside a Powderhorn library? It just can't be done.
There should ideally only be one place to search, one overall archive silo. The logic of our situation is steadily driving us towards having a disk-based archive, a silo holding everything - structured, semi-structured and unstructured information. It needs to hold everything in a searchable form.
That searchability needs to be multi-level, with simple key word search being a start. Then we need to be able to search by meta-data such as author, subject, date, type - e-mail, document, slide deck, etc. But we also need to be able to search by information classification - relates to supplier X, subject to HIPPA, etc. We need to be able to build complex searches, either by box-ticking on a screen form or by using SQL query language or an equivalent facility.
We need to be able to get information into this archive. That means including as many file types as we need and being able to index file contents - automatically. In practice that means indexing text-based files. Indexing videos and sound records automatically is beyond us in any practical sense.
What Archivas is doing is, it seems to me, responding to this idea of having an all-inclusive archive and building out its product to better meet the developing needs.
An irony of this is that a disk-based archive could still need protecting. We might end up with a two-level archive, a searchable on-line one and a vaulted tape-based one. Is there some new law applying to storage that says storage tiers will always increase in number?