De-duplication vendor Data Domain has duplicated Isilon's IPO success by raising almost $111 million dollars in its initial public offering (IPO) earlier this week.

The company's shares sold for $15 instead of the forecast $11.50 - $13.50, and it is now valued at $776.5 million; heady stuff and a sign of investor confidence in high-tech issues by loss-making companies such as Data Domain. But, arguably, it also says something important about data de-duplication.

Simon Robinson, the storage research director for the 451 Group, said: "The 451 Group believes that Data Domain's IPO today is significant in a number of ways. First, it indicates that disk-based data protection utilising de-duplication capabilities is increasingly becoming a "must have" feature for small to mid-sized organisations. As well as providing speedy online local recovery, platforms such as Data Domain's are also helping these organisations implement cost-effective disaster recovery operations, sometimes for the first time."

"Secondly, the IPO establishes Data Domain as the early front-runner in this emerging market, which should provide a wake-up call to other data protection vendors that de-duplication is a real technology that users are keen to implement right now. The de-duplication market has grown from nothing three years ago to be worth an estimated $260m in 2007. At this growth rate this could easily become a $1bn market by 2009. As major vendors such as EMC, Symantec, Network Appliance and Quantum Corp. come to market, we believe that the race is on."

De-duplication report

The 451 Group is about to issue a special report on data de-duplication. We understand that its main findings include the following points:-

- Over the past three years, the de-duplication market has grown from nothing to $100m in revenue for participants in 2006. We expect this market to become a $1bn market by 2009 – a target that is attainable given that this technology has broad appeal across vertical markets and among organisations of various sizes. The market for data de-duplication is already proving to be a 'real' market from a revenue perspective when it's applied to the backup infrastructure.

- De-duplication tools are rapidly evolving from point products to become features of broader offerings. As part of the first wave of evolution, de-duplication products are being applied to the backup infrastructure. The next big opportunity is applying de-duplication to other elements of the storage infrastructure, such as archiving and even primary storage. Research reveals that users are also aggressively embracing de-duplication as part of disaster-recovery implementations.

Is De-dupe on your buy list?

- End users surveyed for the report indicated a high preference for buying a de-duplication offering in the near future, regardless of the size of their organization. Although 'immature technology' was cited by users as the chief reason for not buying de-duplication products today, the 451 analyst thinks this barrier will quickly dissolve as startup products mature and as major vendors bring products to market.

- Although de-duplication can be applied to a range of different markets, there's still confusion among end users and potential consumers of this technology. The strongest adoption of de-duplication has been in the midrange markets, where 'source-based' approaches are proving to be popular. In general, source-based de-duplication is most relevant for small and remote branch offices, while NAS-based approaches have had the most penetration in the midmarket, and VTLs (virtual tape libraries) are most relevant to large environments. VTL-based approaches are less widely deployed to date, but we expect this to change soon as more products come to market.

- While Data Domain is the initial market leader, other vendors are aiming to close the gap. Some of them will apply significant pricing pressure over the next 12-18 months as they add de-duplication functionality to their existing product lines. We expect EMC, Network Appliance, Quantum Corp and Symantec to be among those vendors applying the most pressure, although all but EMC have much to prove in terms of product capability and customer traction.

Indeed, Quantum has just announced its datacentre-class DXi7500 de-duping system which offers ingestion or post-ingestion de-duping and presentation as a network-attached storage (NAS) box, VTL, or plain disk.

- Large storage vendors that have yet to commit publicly to de-duplication as a major strategic focus include Hewlett-Packard, IBM and Sun Microsystems. However, all of these vendors have potential market access via partnerships with smaller de-duplication players. These partners are also potential acquisition candidates, and they include Asigra, Diligent Technologies, ExaGrid Systems, FalconStor Software and Sepaton.

HP partners Sepaton. Hitachi Data Systems partners Diligent. Sun partners FalconStor although not officially for de-duplication, as does IBM.

Hash- or byte-level de-dupe

Most de-duplication vendors fall into one of two main technology camps, offering either hash-based de-duplication or byte-level delta de-duplication. Hash-based de-duplication products place heavy demands on processor and memory resources, and tend to cater to smaller data sets. By contrast, byte-level delta technology typically provides greater backup performance, but often relies on complex shared storage architectures such as multi-node clustering.

Accordingly, we believe large enterprise customers with tight backup windows will gravitate toward byte-level delta products in the near term, although hash-based products should steadily narrow the performance gap as processor power and memory capacity continue to improve.

Furthermore, hash-based de-duplication products reduce data on a 'global' scale, while byte-level delta products minimise redundancies found among multiple revisions of the same backup set. So far, neither approach can rightfully claim superiority in all use cases, although global de-duplication seems to excel in desktop/laptop and remote-user backup scenarios.

Many vendors make lofty claims about their data de-duplication ratios as points of differentiation, with some as high as 500x or more. This ratio is governed as much – perhaps more – by data type and change rate as it is by the specific form of de-duplication used. Database and email applications tend to produce high de-duplication ratios, while compressed media and unique scientific data gets minimal data-reduction benefits. Likewise, daily full backup schedules exhibit higher de-duplication rates compared with incremental backups. Most de-duplication users surveyed for the 451 report noted data-reduction rates between 15x and 20x, although some saw reduction of greater than 50x, while others experienced less than 5x reduction.

The impression we are left with is that investors now judge that data de-duplication is ready for the mainstream and value Data Domain accordingly. Let's see if business IT purchasers think the same.