When it comes to multicore chips, too many cooks spoil the algorithm. Or so a team doing simulation tests at Sandia National Laboratories have declared in a recent announcement.

Working on key algorithms for analysing large data sets, the researchers discovered that going from two cores to four significantly improved performance. However, going from four to eight barely improved performance, and doubling it again to 16 multicores actually dropped performance to two-core levels.
"After that, a steep decline is registered as more cores are added," the announcement said.

The bottleneck? The memory bus. Data can't flow to and from the processor fast enough for all those cores. "The difficulty is contention among modules," explains Sandia's James Peery in the announcement. "The cores are all asking for memory through the same pipe. It's like having one, two, four, or eight people all talking to you at the same time, saying, 'I want this information.'"

"To some extent, it is pointing out the obvious - many of our applications have been memory-bandwidth-limited even on a single core," admits tester Arun Rodrigues in the announcement, although "it is not an issue to which industry has a known solution."

Intel researcher Clay Breshears wasn't surprised, and states as much in a blog post. "I mentioned this problem 2 years ago. Plus, memory contention on a single bus was the reason for shared memory computers peaking around 32 processors back in the late 80's."

"Will we continue to be plagued by a lack of memory access technology to support higher demand of multi-core chips for data to process?" asks Breshears. "It's been a perennial struggle, but now would be a good time to start looking into this."