Teraflop computers on the desk came a little closer as AMD unveiled its 'teraflop-in-a-box' answer to Intel's 80-core chip. Both devices perform one trillion floating point operations per second. AMD's proof of concept; it's not a real product, has four processors grouped together: two 64-bit, dual-core Opterons and two coming R600 stream processors, courtesy of its recent acquisition, ATI. This four processor device runs Windows XP Professional and is capable of more than 1 teraflop using a general “multiply-add” (MADD) calculation. AMD says this is ten times faster than current 4-processor servers which run at around 100 billion floating point calculations per second (100 gigaflops).
The R600s are massively parallel processors, the kind that are generally used to churn through the huge number of polygon-based calculations needed for 3D graphics. They have been co-opted for general server use.
Applications generally need to be especially compiled to use graphics processors in this way. The suspicion is that only by using the R600s could AMD reach the teraflop level. The company didn't say how many Opterons on their own would be needed to reach it. Instead it presented a view that the way to reach the teraflop level was by accelerating specific applications using co-opted specialised processors, like the R600s, to augment general-purpose Opterons.
What would such desktop supercomputers be used for? AMD points us towards various commercial and scientific applications including ones in the energy, financial, environmental, medical, scientific, defence and security areas. They would run compute-intensive applications ten times faster than top end servers today.
To keep the processors busy will mean having huge amounts of RAM, possibly heading out to the 100GB area and beyond. It will also require not having a RAM-CPU bottleneck. With AMD's HyperTransport technology co-processors can access each other and memory directly via the HyperTransport. This reduces latency because access to central memory and processor accesses will be direct and not via another general purpose and potentially bottlenecking bus such as Intel's PCI-Express.
Although called a teraflop-in-a-box device consideration of its storage requirements indicates that a desktop form factor wouldn't be practical. Spinning disk would be needed to load up memory and we might expect, using rule of thumb, a 250-500-plus increase to get to disk capacity from RAM capacity. So if there was 100GB of RAM then we might need 50TB of disk.
That means that the supercomputer might be desktop-sized but its disk storage would need a rack unit.
If disk storage were required to hold the huge amounts of data such processors for executing a job rather than RAM then only solid state disks would have the speed needed to try and keep up with them.