Nvidia plans to integrate CPU cores alongside graphics cores in Tesla high-performance chips, which is a change from current Tesla chips that have only graphics processors.
Tesla is targeted at supercomputing and enterprise applications. Future Tesla chips with CPUs and GPUs could resemble Nvidia's Tegra processors for smartphones and tablets, which already combine ARM CPUs with GeForce graphics processors. But Tesla chips will combine CPU cores with graphics cores typically found in a discrete graphics card.
"Tegra is going to become GPU computing capable in the not-so-distant future. Sometime this decade we are also going to start bringing integrated CPUs and GPUs together in the Tesla line," said Steve Scott, chief technology officer for the Tesla product line at Nvidia, in an interview recently.
Nvidia declined to provide a specific date on when Tesla chips with CPUs and GPUs would be released. However Nvidia is building its own chips based on ARM's new ARMv8 64-bit architecture, which is expected to be used in servers, smartphones and tablets.
"We are working on Project Denver, which aims to make a very high-performance ARM core based on ARM64 that can be used in the range of Nvidia products," said Nvidia spokesman Hector Marinez in an email.
ARM on Tuesday announced its first 64-bit processors, the Cortex A57 and A53, which may go into servers starting in 2014. The new ARM processor cores are also derived from the ARMv8 architecture.
Some of the world's fastest computers combine CPUs and GPUs for complex scientific and math calculations. The recently announced Titan supercomputer at the US Department of Energy's Oak Ridge National Laboratory pairs 18,688 Advanced Micro Devices 16-core Opteron 6274 x86 CPUs with 18,688 Nvidia Tesla K20 GPUs to deliver 20 petaflops of performance. Nvidia's pairing of power-efficient ARM CPUs and GPUs on a single chip could provide tighter integration of components and quicker throughput while cutting server power consumption.
Low-power ARM processors are mostly found in smartphones and tablets, but there is a growing interest in ARM servers as companies look to cut energy bills. Last year Nvidia said that a prototype supercomputer with 1,000 quad-core Tegra 3 chips was being built at the Barcelona Supercomputing Center in Spain. Companies are looking at ARM servers as an energy-efficient way to handle large volumes of Web requests tied to search or social networks. Hewlett-Packard, Dell, Facebook and Amazon have expressed support for ARM servers.
AMD, however, has a jump on Nvidia in combining CPUs with GPUs in supercomputing products. AMD in August announced FirePro A300 high-performance processors that combine x86 CPUs and GPUs. Like Tegra chips, AMD also combines x86 chips with Radeon graphics cores in its A-, C- and E-series processors for PCs and Z-60 chips for tablets.
Scaling performance on supercomputers while keeping power consumption low is a challenge, and GPUs are faster than CPUs for complex applications, Scott said.
"CPUs have a small number of cores, they are big, they are complex and they are brilliant making a single task or a small number of tasks run fast. GPUs have hundreds of really tiny, power-efficient cores" that are throughput-optimized and power-efficient, Scott said.
A more distributed computing model needs to be adopted to scale performance with transactions being executed in parallel across CPUs and GPUs, Scott said. About 90 percent of the processing in Titan will be on GPUs and some residual serial code left over will be processed on CPUs, Scott said. The estimated energy bill for Titan will be US$9 million a year, while a CPU-only Titan at 20 petaflops would have had an energy bill of roughly $60 million a year, according to ORNL and Nvidia estimates.
Combining Tesla with a 64-bit ARM processor is a good idea, said Jim McGregor, principal analyst at Tirias Research. ARM processors with 64-bit address have a larger memory ceiling than current 32-bit ARM processors, which have a limited memory ceiling of only 4GB, which is not enough for supercomputing.
"High performance on 32-bit just isn't happening," McGregor said. "It makes absolute sense if you have a 64-bit."
Future Tesla chips could be massively parallel with a large number of CPU and GPU cores. The chips could be useful in hybrid computing models where some processing is done locally and some in the cloud, McGregor said. ARM cores provide an efficient mix of power and performance, are efficient at handling data traffic, and could be used for data mining or financial transactions.
But faster x86 CPUs from Intel and AMD may be needed for complex scientific calculations, McGregor said.
Server architectures have changed over the years and cores should be adopted "to meet the size of the data," McGregor said, adding that software support is also important.