Within the next five months EMC will announce its cloud computing/web 2.0 storage products, code-named Hulk and Maui. What do we know so far about them?

For the purposes of this feature the terms cloud computing and web 2.0 refer to storage applications characterised by potentially extremely rapid and high growth in capacity to multi-petabyte levels, unstructured and semi-structured content, and possible remote delivery of storage services in a utility style.

We should be thinking of Google-type storage scale and not Isilon or NetApp ONTAP GX. IBM's XIV purchase appears to be aimed towards this area. EMC's recent MozyEnterprise announcement is also a cloud computing/web 2.0 application and EMC has indicated that the platform behind it will be used for backup and recovery and archiving services.

EMC CEO Joe Tucci first revealed them at a November 2007 analysts' conference. Hulk is hardware and Maui is software. Together they form a clusterable storage system, built on commodity software and servers, for multi-petabyte capacity storage applications.

Of Maui we know that it goes beyond what a clustered file system, such as Isilon's, does but includes part of what it does. It provides more of a global storage repository than a clustered file system and is orders of magnitude beyond what is currently available on the market, according to EMC.

Of Hulk we know that it will involve clusterable storage units. We can deduce more from what an EMC staffer has written.

EMC clues

This is Chuck Hollis, EMC's VP for technical alliances, and he has written about his (EMC's) view of cloud storage needs thus: "Presenting storage as blocks (e.g. LUNs) won't scale. Presenting storage as files won't scale. You'll need an object-oriented approach with rich semantics - nothing else will work at this uber-massive scale."

"It goes without saying that costs matter, but in a very different way. Take any small cost (hardware, software, energy, administration, etc) multiply it by a very large number, and you have a very large cost."

He thinks cloud storage must be autonomous: "If you can imagine many petabytes with billions of objects in hundreds of locations and millions of users, this means that management is an entirely unique proposition."

"The environment must be self-tuning, and automatically react to surges in demand. It must be self-healing and self-correcting at a massive scale -- like the internet, no single scenario of failures can bring it down."

"The idea of a bunch of administrators sitting glued to multiple consoles, watching indicators and firing off commands -- well, that just won't work here. Not only is it hard for people to react fast enough, no one can afford that much human capital to keep things running smoothly."