De-duplication has become popular for backup data, but not for primary storage. Now, US start-up company Ocarina Networks wants to change that, with a data reduction technology which it claims can shrink live production data too - even if the file formats are already compressed.

The technology has already been picked up by Photoways Group, which runs a British photosharing site, Photobox. It expects to save millions of euro in deferred storage hardware purchases as a result, according to its CTO.

In effect, the Ocarina technology disassembles stored files into their constituent parts in order to compress them, via a out-of-band hardware appliance. The compressed files are then restored when needed via a file system filter driver.

The problem with using current de-dupe schemes on primary storage is that "You're much less likely to find duplicate blocks in an online subdirectory, say," explained Carter George, Ocarina's products VP and co-founder. He pointed out that where there are duplicates, they are often not redundant - on replicated storage arrays, for instance.

However, that doesn't mean there's no redundancy within the files, he added: "For example, a PowerPoint, a PDF, a Word document and a Jpeg all might contain the same picture, but it's rescaled, or pasted in a different format, or whatever, and while a human would say 'It's the same picture', on disk there's no common bytes."

So in a process the company calls ECO, for extract, correlate, optimise - Ocarina's storage optimiser appliance cracks open the file format and de-duplicates its constituent elements by looking for patterns at the information level, he claimed.

Using this method, even compressed image formats such as Jpeg can be compressed still further, George claimed. That's because a set of photos of the same event will share image elements - and therefore some of their underlying mathematical properties - and those can be de-duplicated.

"The maths to do this is really hard," George said. "Most companies concentrate on the D part of R&D. We have seven PhD mathematicians doing breakthrough mathematical research on how to find patterns."

The ECO process is extremely processor-intensive, so the optimiser box is a 16-core Linux appliance. It works out-of-band, pulling files off your NAS system, compressing them and then putting them back in Ocarina format - a size-reduced shadow format, with bit-for-bit consistency checks.

File reconstruction is much faster and is handled by reader software, also Linux-based. You can install it as a filter on a web or application server, or on a workstation, or buy a complete Ocarina Reader appliance.

The reconstruction process adds around 4ms latency, George said, and because you can have multiple readers - Ocarina sells unlimited sites licences - it shouldn't be a single point of failure.

He added that, as well as selling the technology in appliance form, Ocarina is working with other suppliers to develop integrated tier-2 storage subsystems.

Nevertheless, will the benefit of this kind of compression be enough to overcome users' reluctance to tamper with online data? That may depend on the market sector, suggested Forrester analyst Andrew Reichman.

"There is considerable market resistance to de-duplication of primary data, where the industry has long relied on creation of multiple copies to guarantee the protection and availability of critical data," he explained.

"If you are in the banking or financial services sector, don't expect your peers to adopt this any time soon. But, if your firm has less stringent SLAs around data loss and availability and faces staggering growth from day-to-day operations, this could be the missing link that allows you to stay ahead."

Carter George agreed, saying the technique is best suited to relatively static content such as online photos, social networking sites, oil and gas exploration data, and so on.

It certainly works for photo sharing, said PhotoWays CTO Graham Hobson. His company gets 1.5 million photos uploaded per day and is one of around a dozen early customers for the Ocarina technology.

"The first time I heard about Ocarina, it sounded simply too good to be true," he admitted, adding though that, "Based on our initial testing, we are confident that the Ocarina solution will allow us to defer a significant portion of our storage purchases this year."

He continued: "Bottom line, we believe that Ocarina will pay for itself within six months of installation, save us millions of euro over the next few years, and change the way we buy storage and the overall economics of our business in the future."