The front runner in very large data centre storage architecture development is probably Google with its cloud computing concept. Sun is probably the only product vendor that has articulated the ideas behind cloud computing with CTO Greg Papadopoulos' red shift analogy.

A signal astronomer's use for helping to define expanding universe ideas is the Doppler effect or red shift; the way light from a star that is receding from us at incredible speed and is a very long distance away already has its light shifted towards the red end of light's spectrum. Some organisations using IT are expanding their IT needs so incredibly fast that Papadopoulos thought that a red shift analogy applied to them.

Google and Amazon, for example, are red shift IT organisations that need to scale out their IT performance, storage and networking capabilities, far, far, beyond anything previously seen or envisaged. Papadopoulos says it is terascale. It is estimated that Google's server estate has passed 500,000 servers. Like Amazon and Microsoft it is building massive data centres around the globe to provide both the capacity - servers, storage, networking - it needs as well as the 24 X 7 access necessary.

But Google does not use a storage area network (SAN). It has no world-wide network-attached storage (NAS) infrastructure. Instead it uses thousands of Linux servers with cheap disks - direct-attached storage (DAS) - and organises their contents inside its own Google File System (GFS). Effectively much storage intelligence has passed from array controllers into the file system.

The huge, explosive ingress of data in red shift IT organisations blows away current storage infrastructures. Google, Amazon, Yahoo! and Microsoft Live need to scale up capacity and storage access performance to unheard of levels to cope with user demand. Traditional SAN and NAS storage architectures cannot cope with the storage demands of multiple petabytes of data and applications needing to capture the constantly incoming flood of data, organise it, provide sub-5 second access, and safeguard its storage.

It's said that the cost/GB of enterprise SAN storage is around $20/GB whereas such cloud computing storage could be as low as $1/GB. You can't get that cost by buying EMC Symmetrix or Clariion arrays, IBM DS8000 DS4000 products or NetApp arrays. You have to buy in commodity disks and organise them using commodity and/or open source software on commodity servers.

SAN and NAS antithesis

Cloud computing storage is the antithesis of traditional SAN and NAS storage. The good news is that relatively few organisations will have the size needed to build out cloud computing infrastructures. The bad news for SAN and NAS storage vendors is that they could be so incredibly massive as to trigger a significant migration of their customers to using storage-as-a-service (SAAS) on the massive clouds provided by Google, Amazon and others.


For example:-

- Google offers hosted office-like applications,
- Amazon offers its SimpleDB, S3 storage service and Elastic Compute Cloud (EC2).

SAAS could be quite a disruption.

Cloud computing or cloud cuckoo land?

This categorisation of massive inter-linked data centres offering web-facing services by Google, Amazon, Yahoo!, Microsoft and others is a description that is inevitably crude. The details of such mega-data centre architectures are kept generally secret, for competitive advantage reasons. The 'cloud' title is determined by the apparent scale of computing power on offer and it's availability as a web-delivered service to end-users (YouTube, MySpace, Facebook) or to business (Google EC2, Amazon SimpleDB).

Cloud computing isn't defined by particular data centre architectures. Nevertheless cloud computing can be seen as a logical evolution of grid and utility computing ideas. The massive petabyte-level scale of cloud storage would bring the cost/GB of storage to the fore and rule out traditional controller-based array building blocks on cost grounds.

A rough consensus view seems to be that clustered NAS systems will be a common cloud storage architecture; the Google-type clustered server+DAS infrastructure is unique to Google and its particularly search-focused needs. A clustered NAS system is more generally applicable and needs to have a very large and global namespace for its files and an infrastructure for organising millions of files, file protection and access.

Cloud storage seems to be quite separate from current SAN and NAS storage because of this controller-less array architecture and a need for a different file system, one that has the scale capacity and actively manages data protection in the storage media it oversees. It has to do that; there are no controllers, meaning no RAID hardware.

Two such file systems have some public presence: GFS from Google and ZFS from Sun.

Can businesses with a need for petabyte-levels of storage use cloud computing storage models? Isilon might argue that some already are. For example its customers in the media area are using clustered NAS systems to stream billions of bytes of video files to their users involved in rendering movies or to their customers.

Other areas where exceptionally large and file-based online data stores are necessary are the pharmaceutical industry and some earth sciences applications which may well use supercomputers.

The attraction of cloud computing is similar parallel processing of supercomputer applications but at much lower cost.

At the other end of the enterprise scale small businesses (SME) could well find cloud-based computing and storage services attractive because it means they don't have to acquire, manage and operate their own IT infrastructure to do those things. It saves them quite a lot of time, enables focus on their core business activities, and saves them money.

Google and Amazon say there are thousands of SME clients for their cloud business-facing services. As these enterprises grow and need more IT services the cloud vendors are in a position to offer more and deny the traditional IT vendors this business. Mid-range and larger businesses may then be attracted to cloud-based services.

It is in this sense that cloud computing could become a quite disruptive technology over time.


The players

IT hardware vendors are not offering cloud storage products - yet, although some, such as Sun, have experimented with cloud-based concepts. However, several vendors are positioning themselves to supply cloud computing hardware and software to businesses or to sell cloud computing and storage services to both business and end-users.

Amazon Amazon founder Jef Bezos appears to want to transform Amazon into a 21st century IT utility. His retailing IT infrastructure uses Linux servers and Oracle Real Application Clusters with HP MSA storage arrays; at least, it did in 2004. Amazon is a cloud services player wanting to offer business infrastructure components hosted on its own terascale computing set-up.

EMC currently has no clustered NAS offering and Centera is an expensive online archive for unstructured information. But it has announced its Hulk and Maiu clustered NAS hardware and software and selected customers may hear more in January.

HP has its acquired PolyServe technology. However, not much has been made, publicly, about it and HP is not expressing cloud computing concepts. It does have Amazon as a customer though and one HP Labs blogger has referred to it.

Google is the archetypal cloud computing services provider leveraging its own tera-scale data centre infrastructure. This is being aggressively built out and Google probably owns the largest IT infrastructure on the planet, bar none.

IBM and Google are working together to seed universities with small cloud computing facilities so that computer science students become familiar with the programming concepts involved. Here's how IBM chief Sam Palmisano described it: "This project combines IBM's historic strengths in scientific, business and secure-transaction computing with Google's complementary expertise in Web computing and massively scaled clusters. We're aiming to train tomorrow's programmers to write software that can support a tidal wave of global Web growth and trillions of secure transactions every day."

Isilon has a growing clustered NAS product line and has recently launched the world's largest NAS cluster with almost 100 nodes and a theoretical 2.3PB of capacity. It also counts NASA as a customer and is well-funded with post-IPO cash.

Network Appliance has the clustering technology in ONtap GX, its operating system for its NAS and SAN products. However it appears to be trailing Isilon and others at the clustered NAS leading edge as it concentrates on its mainstream enterprise customers which have served it so well and which value its products highly. We might expect NetApp to become more active if a cloud computing enterprise market develops.

Sun in many was appears to be the hardware and software vendor with the most cloud computing components. It has its ZFS file system and has embraced the commodity server/commodity disk/open source software route with products such as the X4500 storage server and Solaris 10 software stack. With Papadopoulos Sun has a formidable exponent of cloud computing concepts and the company is obviously well aware of cloud computing and excited by the concept and its possibilities.

Seagate has bought its way into storage services via the EVault acquisition and is building on that to offer online backup to its own data centres. It has also bought an e-discovery firm, MetaLincS. Why? There's pots of money to be made in disk drives yet, so why this diversification? A long shot would be to say that Seagate wants to become a major force in supplying storage-based online services to businesses and is another prospective cloud computing services supplier.