Grid Computing ("GC") is becoming more and more popular as a way of spreading the load of complex applications over a number of distributed processors in order to solve problems more quickly. As with most technologies, the concept of GC can be taken to extremes, with IBM currently leading the push to migrate the corporate network entirely toward grid computing. IBM's idea for GC is to take the assorted computer systems in an organisation and make them into a single "virtual" computer. Imagine you have a collection of computer systems in the organisation – some Intel-based file/print servers running Windows 2000, perhaps a Sun database server using Solaris, maybe even some higher-end mainframe equipment, and of course a number of desktop computers. IBM wants to bring all these systems together so that, instead of existing as discrete resources, they appear like a single entity into which you pour work and out of which you pull results. Getting the idea
If this is a hard idea to grasp, think about a simple fileserver with two processors. You know there are two processors in the box, but you don't actually care – the operating system deals with the issues of allocating work to each processor in order to get stuff done as quickly as possible. Now think as your entire building as that fileserver – you know there are loads of processors in there, but as far as you're concerned it's a single computer, and you don't care how the actual work gets allocated to the processors. This is the virtual computer to which IBM is alluding. There's a couple of major hurdles to overcome with this model, though. In a server you have a number of identical processors – each unit has the same architecture and runs at the same speed. Not only this, but the processors are tightly coupled with the computer hardware (the disks, the RAM, the peripherals). If we now step back into the virtual computer that spans our entire hardware base, we have a disparate set of dissimilar processors, each of which has no direct coupling with the majority of the memory, peripherals or disks. Architecture abstraction
The goal with a virtual computer is to run any application we wish on any lump of hardware that happens to be available to do some work. The first thing we need to do, then, is address the issue of how we write a program that will run on (say) a Solaris-based SPARC machine one day and a Windows-based Intel machine the next. This is normally achieved by adding an abstraction layer between the platform and the program – if the program can be written to conform to a fixed API, and each computer can be made to present its system resources such that they conform to this API, the problem goes away. IBM's answer, unsurprisingly, is Java. "The whole point of Java is write once, run anywhere", said Cathcart; once you've put a Java Virtual Machine on each of your computers, you can run any Java program on any of the computers – the JVM provides the abstraction layer that's required to provide the interface between the hardware and the programs. Networking
Within a single server, the connections between the various hardware components are screamingly fast – often many gigabits per second between CPU and memory. With distributed machines in a GC setup (just as with any traditional parallel processing system, in fact) the speed of the links between the computers will be a significant factor in the overall performance of the system. Given today's networking technologies, though, gigabit-speed inter-computer links are an inexpensive commodity, and so although the intercommunication process is still far from trivial, we at least don't have to worry about being able to afford a fast enough physical connection. Resource access
The second main issue that we have to resolve is access to data and peripherals. As we've said, the processor of a PC (or mainframe, or mini) has direct access to the various bits of kit inside its own box – but for GC we need to establish a mechanism for machine A to access a resource on machine B (after all, it's all very well asking a random machine to run the payroll application, but how does it fetch the personnel data if the latter is living on a disk that's connected to a different computer?). There are two levels at which we can approach this problem. The more complex, but more efficient level is to make physical changes to the corporate infrastructure, to move from directly-connected storage media toward a shared SAN, where systems can request direct access to data over the SAN media. The simpler, but slower approach is to abstract the resource access just like we abstracted the processor architecture, and to use high-level protocols such as HTTP to ship data between processing units. Clearly the SAN approach is preferable from a performance point of view, though it is likely that organisations dipping a toe in the water will begin with the simpler approach and gradually evolve toward SANs. Authentication and access control
Associated with resource access is the issue of authentication and access control. At the higher level, IBM's answer is that the now-established Web Services concepts include authentication and access control mechanisms, and so these can be employed across our virtual computer (after all, cross-machine function calls will inevitably be implemented as Web Services). At the lower level, though, the enterprise will need an integrated directory service scheme into which all the computers will tap for their access control. Given that (a) Novell eDirectory (NDS) is an efficient directory service mechanism that is proven to be highly scaleable; and (b) IBM owns a lump of Novell, it doesn't take a genius to guess what the chosen directory services system is likely to be. Applications
There is only one real Achilles' heel to this argument for transforming the organisation into a virtual computing platform, and that is application support. Most of the big application companies have so far chosen not to make the leap from compiled, platform-specific software toward platform-independent Java implementations – not least because although platform-specific programs cost more to develop (if you have three platforms, you have to write three different versions, albeit with some overlap in the middle) they will always be faster than platform-independent ones. There is work to be done, then, in encouraging the application writers to develop their stuff such that it will run anywhere without modification; there's also some work to be done on the Java layer upon which the applications will sit if the application writers are to be persuaded that it's a fast enough platform to employ. System control
The final consideration about this model of grid computing is that the task of actually scheduling the various jobs to happen on the virtual computer is a very complex one. What we're effectively doing is taking a set of hardware and operating systems, building another abstraction layer on top of these (the JVMs) and them adding an operating system-like structure on the top of the JVMs to handle the scheduling and resource allocations between the various computers in the network. While the complexity of this layer with regard to hardware access is lower than would be the case with a normal OS, the scheduling task is considerably greater, since instead of coping with a small number of identical processors, it has to cope with an arbitrary number of dissimilar processors that may even be connected to the network using links of different speeds. While the application writers are overcoming the issues with developing efficient "write once, run anywhere" packages, the far-from-trivial task for IBM is to make this scheduling engine a reality.