To IT managers, high-stakes supercomputing may seem like the land-speed record: a freak show, amusing but hardly relevant. Oh, a car broke Mach 1? And a defence lab has a 280 TFLOPS computer? Cool. Now let's get back to work.

However, supercomputing specialists are wrestling with problems that will affect everyday IT within the next two to five years. Essentially, improvements in processors have outstripped those in data movement. For some time now, the limiting factor in high-performance computing has been the speed with which data can be moved to and from the processors. Indeed, the cylindrical shape of the iconic Cray supercomputers is an effort to limit the distances data must flow.

Because supercomputing is the sharp end of the technology spear, these data-flow problems -- still manageable in most corporate data centres -- are quickly reaching critical mass in the world's top research facilities. Breakthroughs are needed, and experts acknowledge that answers are elusive.

Backdrop: multi-core, clusters

There are two key factors in today's supercomputing tumult: multi-core chips and the rise of "cluster" supercomputers composed of hundreds or thousands of humble Intel-style CPUs.

Multi-core chips place more than one processor on a single integrated circuit. Dual-core PCs are already common, and experts believe this Moore's Law-driven progression will continue so that by 2010, your garden-variety chip will house 64 processors. With each of these processors running four software threads at once, 256 threads could be simultaneously executed on a single chip.

Jack Dongarra, a computer science professor at the University of Tennessee and the keeper of the list of the world's top 500 supercomputers, says these elite machines typically cluster 500 to 1,000 processors. The current top gun, the IBM Blue Gene at Lawrence Livermore National Laboratory, can process 280 trillion floating-point operations per second (TFLOPS) and clusters a staggering 131,072 processors. Those clusters must rapidly move huge quantities of information from memory to all those processors and back again. The bandwidth needed for this data flow, along with the latency caused by the sheer distances involved, effectively caps the amount of work the computer can do.

Dongarra says that more than 60 per cent of the top 500 computers are clustered rather than relying on the traditional exotic architectures most commonly associated with Cray. "Clusters have completely changed the scientific computing landscape," he says, because they offer a price/performance ratio that exotic machines can't touch.

Moreover, as clusters have become popular, users have found "a surprisingly large number of real-world applications that do not require the extreme latency and bandwidth capabilities of the exotics," says Justin Rattner, an Intel distinguished fellow.

However, as users call for more powerful tools, Cray executives believe the supercomputing pendulum is swinging back their way -- and some research scientists agree. Indeed, they say, it's possible that before 2015, exotic supercomputers, with their benefits of low latency and high bandwidth, will join clusters, with their price advantage, in hybrid architectures suitable for a variety of applications.

According to Jan Silverman, a senior vice president of corporate strategy at Cray, the company is working on compilers that can distinguish code best suited to its vector processors -- which can operate on a whole string of numbers at once -- from code best run on a more pedestrian scalar processor. These compilers, which Silverman says will be in production by 2009, will be able to schedule vector and scalar work for the hybrid supercomputer. Otherwise, this slow and difficult scheduling work would fall to programmers.

To exploit the parallelism inherent in software as fully as possible, IBM Research, among others, is trying to allow processes to run out of order and then be reassembled on the fly in a process it calls speculative multithreading. Al Gara, chief architect for IBM's Blue Gene, calls it a potential breakthrough, but as the number of threads increases, the components that must eventually be reassembled into the final result grow exponentially. "You need a combination of hardware [and] software to guarantee correctness," he says, "with an assist from the compiler."

The US Defense Advanced Research Projects Agency has a program called High Productivity Computing Systems aimed at doubling the productivity of scientific computers every 18 months until at least 2010. In response, several academic researchers are working on languages and compilers for parallel and high-performance computing.

For example, Ken Kennedy, director of the Center for High Performance Software Research at Rice University, is developing a system that uses a library of components to generate high-performance compilers for specific scientific domains, such as biological computing or signal processing.

He's also exploring ways to parallelise Matlab, a favourite of scientists and engineers, so that Matlab arrays can be distributed across a parallel machine. Matlab would then be far easier to use than that old standby Fortran, which must be carefully crafted for specific computer architectures, Kennedy says.

"It's important to have big computers," he says, "but there's two parts to that: having a computer and being able to use it. High-end computing has been overly limited to people who are really expert in programming. We have to not only go for very high performance, but for very high productivity."