If you thought that Linux was only for PCs and small systems, SGI would like a word with you. Having shifted its 64-bit parallel processing servers from MIPS chips to Intel's Itanium 2 just over a year ago, it has now extended the resulting Linux-based Altix family downmarket with the Altix 350 - to a mere 16 processors.

SGI's HPC (high performance computing) solutions manager Dr Crispin Keable says we had better get used to the idea. He argues that an industry-wide move to Itanium-based parallel processing under Linux is inevitable, given the rising demands for both compute power and software neutrality.

"People are recognising that the way to get more performance is more complicated than MHz - that had its time, but now there are diminishing returns," he says. "Simply cranking up the clock cycles is not producing performance gains, and while processor cache sizes are growing, people's problem sizes are growing faster still.

"The world is going parallel and it's going parallel quickly. It won't be long before everyone has multiple cores on a single piece of silicon, then people will have multiple processors in servers, and eventually even in laptops."

The challenge for high performance computing is how to do it cheaply. A popular route is to cluster standard Linux-based PCs using open source technology such as Beowulf, but clustering is still relatively hard to implement and scale efficiently and will only suit some applications, in particular homogeneous tasks that can be broken down simply.

By comparison, the Altix 350 uses SGI's NUMAflex implementation of NUMA (non uniform memory access), a technology that allows multiple processors to share memory. Adding rack-mounted processor, memory or CPU modules allows it to scale from a single or dual processor box to 16 processors and up to 192GB of shared RAM. What it lacks, compared to the Altix 3000, is the router technology that allows NUMA nodes to be tightly coupled , creating a larger system.

SGI claims that the NUMA approach is better for heterogeneous workflows, because each multiprocessor node can handle larger tasks than a single PC, and because it has already done the work needed to connect nodes together so they can cooperate.

"People quote the cost of Beowulf on the cost of a CPU, and try to brush under the carpet the cost of the system interconnect," Keable says. "Getting scalability out of Beowulf-type clusters is still very difficult."

However, SGI may have erred on the high side with its original 64-processor Altix 3000 series. The ability to scale to 64 processors, and cluster those systems for up to 512 processors, brought a price tag that put it out of the reach of many organisations. That in turn left the field open for other NUMA vendors such as HP, IBM and Sun.

"The 350 is Altix repackaged for the technical midrange," Keable says. "It means you don't need to join a consortium to get into high-performance computing."

The use of Itanium is also controversial, as early versions provided lower than expected performance, and because its backward compatibility is limited when compared to AMD's 32/64-bit Opteron. But like HP, SGI has nailed its flag to Intel's mast so Keable loyally defends the chip giant.

He argues that the Itanium 2 family, which includes the McKinley, Madison and Deerfield processor designs, is free of the original problems, adding that those were in any case exaggerated by the competition.

"Intel is building chips designed for the future," he says. "Opteron has some great capabilities today, but it also has the limitations of x86, in its register set and other areas. The AMD approach is the last gasp of an architecture that isn't designed for the future - I don't believe it will yield good performance over time."