Grid Computing is a term that's being bandied about more and more in the IT industry – and, unusually for this industry, it's an extremely important thing to know about, because it might just be the future for the average computer network.
The fundamentals of parallel computing
Most of us have come across Symmetric MultiProcessing (SMP) – servers that have more than one processing unit. The idea is that by putting multiple processors in a computer it can do more work, in a shorter time, by allocating different tasks to different processors. All the processors can work "in parallel" – i.e. several bits of work get done at once because several processors are doing their own particular tasks independently of the others.
The next step up is Massively Parallel Processing (MPP) – where you have a sizeable collection of processors, generally not in the same box, all doing their own bit of work in parallel. The difference between MPP and SMP is that the latter generally uses a small number of processors (eight or less) connected by a common data bus. In contrast MPP uses an arbitrary number of CPUs linked in an arbitrary way, with some complex processor allocation software built on top to handle the dishing out of work and collation of results. Although not in the same physical box the processors tend to be in a single location, usually in a single room, connected by the highest-speed interconnection available (often Gigabit Ethernet in today's installations).
Parallel computing has two main problems: cost and intercommunication. SMP systems work well because they have a small number of processors interconnected with a single, shared bus running at tens of Gbit/s. MPP systems bring more scaleability (you can't realistically get above eight processors on a shared SMP bus because the bus becomes a bottleneck) but at the cost of complexity (you have to do more work to handle the intercommunication) and speed (the inter-processor links are usually only 1-2Gbit/s). There comes a point, though, where cost becomes an issue. If you're putting thousands of processors in a single location, you have to buy those processors and the appropriate interconnection fabric, as well as employing people to keep the lights on and ensure the thing is giving value by being used 24x7. Oh, and you have the task of scheduling access to the resources and potentially of partitioning those resources such that they can handle more than a single project at once – a concept that harks back to the move from centralised mainframes toward distributed desktop computers.
The grid approach
Grid Computing is, in its basic form, an extension of the MPP concept. The idea is similar in that you have a set of computers, connected in an arbitrary way, but these systems can live in different buildings, cities or continents. An example is the North Carolina BioGrid which currently spans five of the University of North Carolina's sites.
Grid Computing wipes out the issues of MPP by distributing the processing capability over a number of sites. Instead of having one vast parallel system, you have a number of smaller ones, possibly on different sites, which can be used in whatever combinations are deemed appropriate. So systems can be used independently or pooled into any size of parallel system depending on demand. And generally speaking, two average-sized systems are less costly to procure than one big one of the same overall power.
Where's this heading?
Let's map the multi-site concept back into the corporation. The majority of companies don't have multiple sites with multiple computer systems, but they do have multiple departments, often with dedicated systems. So it's common to have a server for the accounts system, a server for the ERP system, a server for the email system, and so on. The ERP system's usually busy during working hours, but not during the night. The email server has its busy periods and its lazy times, too. The accounts system will have some major peaks and troughs – it'll be hammered to death once a month when it's payroll time, for instance, and then once a year there's the annual end-of-year beating.
So why don't we forget the concept of single-purpose systems and instead make our world into a single, heterogeneous "virtual computer"? That way, the ERP software might always run on its own system, but when it's time to do the payroll run, some of its spare capacity can be used to work out the staff NIC contributions. And if there's a flurry of email as a result of a big advertising campaign, that's fine because all that spare time on the accounts server can be used by the email function.
The obvious platform for all this jazz is Java, because you can get a Java virtual machine for everything from a handheld computer to a mainframe – and if everything's running a JVM, all you need is a clever scheduling application to dish out the appropriate program code and the data for it to work on. The slightly surprising bit is that all the underlying technology for this stuff exists already: do the intercommunication with HTTP and we already have the encryption and authentication mechanisms thanks to what the Web Services guys have been developing over the years, and the idea of dishing out code and data to machines at random is achieved quite nicely using Enterprise Java Beans. So the main leap required to make this a reality is one of faith, not technology.
The network is the computer
A certain hardware and software manufacturer coined this slogan a few years ago, and in fact Sun is working hard to push the concept of Grid Computing. IBM is also keen to promote this approach, and standards for grid computing such as the Open Grid Services Architecture are beginning to emerge.
Imagine your entire computing infrastructure being used as a single, intelligent "virtual computer" which utilises under-used resources on someone's machine to enable someone else's problem to be calculated a bit faster. Well, soon you won't have to imagine it, because that's where the big vendors see our networks going – and the concept is a hard one to fault.