Arkivum is a young British company founded to solve a complex and unfashionable-sounding problem. Organisations are filling up with data – from business processes, research, analytics - for the most part without realising it even knowing where much of it resides. When they do grasp the scale of their data, knowing what to do with it can be a huge barrier to action. All the while, the volume of data continues to grow at a rate that can only be described as alarming.
Anyone doubting the seriousness of data growth should study the volumes of data now being accumulated by the UK universities alone, much of it generated by research activity. Techworld recently documented Arkivum’s estimates of how much data is held by these institutions – the total for the sector is now probably nudging one Exabyte.
Data is everywhere, as a by-product of activity, as a business resource and, increasingly, as an expensive headache. There will be no stopping its growth. Data has been taken for granted but not long into the 21st Century, just as the information seers predicted, the days of data indifference and make-do look are surely coming to an end.
Arkivum believes it can ease the problem of data bloat by offering a long-term, secure place to archive it in ways that meet a raft of increasingly complex regulatory and information management demands.
“Long-term storage of data is not a new requirement for many organisations, but the massive increase in the volumes of data that these organisations possess is increasing to the extent that new ways of managing that data for the long term are required,” says CEO and co-founder Jim Cook.
“Using existing solutions to meet the challenges of preserving data for the long term, at the bit level, are prohibitively expensive and the Arkivum service is a straightforward and economical way to store and manage data for decade long timescales.”
Arkivum - university spin-out
Doing this turns out to be surprisingly complicated. Founded as long ago a 2011 as a University of Southampton spinout, Arkivum spent its early years hunting for customers that understood the problem, focussing rather fruitlessly on the bits of the UK media sector (e.g. post-processing companies) the company saw as having vast amounts of data that would need to be stored. According to Cook, after battering at the door for a while the firm realised that while this sector was awash with data they often didn’t own the problem of looking after it. The data belonged to someone else and its loss or long-term integrity was not their liability.
Running on modest funding and their own resources, the founders turned instead to the university and life sciences sectors, which turned out to be ahead of the private sector in understanding the data archiving issue. Three surprisingly small funding rounds later and the firm has matured with an impressive list of customers that now use its data archiving services.
All young firms that survive will point to customers but Arkivum makes a speciality of documenting them. Recent announcements have included the University of Salford, Sussex University, Loughborough University, and many others in higher education through the Janet Data Archive Framework agreement that offers preferential pricing and low upfront costs. The interest of universities is driven in part by the Medical Research Council (MRC) funding requirements that research data be held for 10 years for basic research and 20 years for clinical data but Arkivum is also looking after two Petabytes of data for the world-famous Francis Crick Institute as it constructs an imposing new London biomedical research facility near London’s Euston station. Applications for archiving seem to multiply.
Customers beyond higher education include the Tate Gallery, the Royal Botanic Gardens in Kew, North Bristol NHS Trust and New York’s Museum of Modern Art (MoMA), the latter a huge digital store of artworks and audio-visual material that will eventually amount to 6.2 Petabytes.
Arkivum - technology
The technological challenge of archiving is to store data in such a way that it will still be usable at a point years or even decades into the future. This is something that the digital era has struggled with. Too often data has been fragmented into silos across numerous hardware systems that end up being stranded in time by proprietary standards. The challenge of archiving is that the data being stored is not a backup and represents the only copy, the original having been deleted. This is the purpose of archiving - keeping a local copy of data is unsustainable because it consumes storage resources.
“Existing technologies in the world of long-term data storage are mainly based around traditional disk-based solutions that are delivered by in-house IT,” says Arkivum’s Cook. “[They] are expensive to run in terms of human and utility resources and typically require significant CapEx expenditure.”
Arkivum’s archiving alternative works as a cloud service through a gateway appliance with its own web front end and dashboard. As well as connecting the service to the customer’s network, the gateway generates checksums to ensure data integrity and encrypts data using keys that are held by the customer. Three copies of the data are created, two copies held by Arkivum on online tape libraries in mirrored ISO 27001data centres with a third copy stored offline at a third-party escrow. The third copy is an important facility because it allows customers to move to another provider without complication.
“it’s an exit plan for our customers in case they want to leave,” says Cook.
The purchasing model allows pay-as-you go or an upfront sum for a specified amount of data defined time period.
“It is the need to insure against the loss of data, information and knowledge that has a value that is often hard to quantify and realise,” says Cook, summarising the forces driving the archiving market. “Archiving isn’t backup - it is for preserving the value of the data.”
Yesterday’s Terabytes turned into today’s Petabytes which in a few years will turn into Exabytes, at which point the vertiginous scale of this business becomes dizzying. How big this market will turn out to be is anyone’s guess but it is certain that in the future people will find it almost incomprehensible that in the early 21st century, data professionals couldn’t see they’d grabbed a dragon’s tail. Data is everywhere and is everything. Company CFOs preach the religion of data’s importance but they rarely seem to fathom the dangers of being consumed by it.
“One day a Petabyte will be a small amount of data,” muses Cook.
Designed for large-scale archiving (Petabytes and up) over long periods of time; offers the three copy archiving (including escrow) mentioned above.
A more cost-effective service that involves only two copies of data (including escrow) over shorter periods of time.
A hybrid approach that allows specialised customers to utilise cloud archiving while running Arkivum/100 locally through an accessible cache. This version was deployed by the Museum of Modern Art (MoMA).