At the International Supercomputing Conference (ISC) in Hamburg this week, Intel announced that all future products based on its Many Integrated Core (MIC) architecture would become part of its newly-named Xeon Phi product family.
The move is a clear attempt by Intel to capitalise on the growth of high performance computing (HPC) processor shipments over the last five years, by establishing a recognised HPC brand. However, more than 74 percent of the supercomputers on the Top500 list already have Intel inside.
For example, the “SuperMUC” supercomputer at LRZ in Germany – which delivers 2.9 PetaFLOPs of performance, making it the most powerful in Europe – runs on Intel Xeon E5 processors. SuperMUC was ranked fourth overall in the 39th edition of the Top500 list of supercomputers.
So if Intel is already dominating the HPC industry, what is the need for Xeon Phi?
According to Raj Hazra, VP and general manager of technical computing at Intel's Data Center and Connected Systems Group, a supercomputer's position on the Top500 list is restricted more by money and power consumption than performance capability.
“There’s no reason we can’t build a number one system today on Xeon E5s. If they added a few more racks to the LRZ system they could have gone up a few more positions, so it’s not an architectural limitation,” Hazra told Techworld.
“As the FLOPs (floating-point operations per second) grow, however, the total system power grows as well, and if you continue to do that it becomes economically untenable. It’s not just about power, it’s a question of whether you can pay for that power every year of operation.”
Hazra said that performance-per-watt is currently the leading metric for the HPC industry. In the US there is a rough benchmark for the cost of power, which is $1 million per megawatt, so a data centre that uses 10MW will cost $10 million to run. SuperMUC consumes 3.4MW, which LRZ decided was the maximum configuration for their needs.
However, Intel is determined that performance does not stop here. The reason the company is developing Xeon Phi is to extend performance through parallelism, rather than simply making bigger and bigger versions of today’s systems.
“What Xeon Phi does is give you much more parallelism than Xeon, and therefore you get a better performance per watt. So with Xeon and Xeon Phi, in the future we have the ability to continue to grow overall performance, but not scale power along those same lines,” he said.
The larger systems get, the more unreliable they can get, due to the increased number of components that could fail. Another advantage of parallelism is that, by increasing performance density, the reliability of the system also improves, according to Hazra.
However, parallelism also brings challenges. For example, the more parallel resources you have, the more attention you have to pay to performance tuning. Compilers and libraries help to take that burden off the programmer, but the programmer still has a role to play.
Parallel program debugging can also be problematic, because it is very difficult to capture the condition at which you want to debug. However, Hazra believes that as compute capacity increases, people tend to innovate on more parallel algorithms, creating a virtuous cycle.
“Algorithms today are in the infancy of parallelism, because we’ve been taught to think differently for the last several decades,” he said. “We believe that not only will we see innovations in systems software and hardware that allow better scaling, but with things like Intel Xeon Phi people are now looking at the next generation of algorithms they can deploy for the same problem.
“It’s not that today’s algorithms scale with more compute, people actually innovate on the kind of algorithm they use,” he added. “If you look at the oil and natural gas space, for many years people have been using reverse time migration. As they see a more parallel architecture like Xeon Phi becoming feasible, they’re starting to look at the next algorithmic jump to full wave inversion.”
In order to show the potential of massive parallelism in supercomputing, Intel constructed a cluster known as Discovery, which is made up of Knights Corner coprocessors. When Knights Corner launches later this year, it will become the first member of the Xeon Phi family.
Discovery came in at number 150 on the Top500 list, but was named the second most energy-efficient architecture, after IBM's Bluegene/Q system, which was also top in terms of performance.
“Our goal wasn’t to get the 150 position, or even get higher, because that’s just a question of adding more nodes. It was more an experiment on our part to make sure that our cluster tools and software were ready for when we go into production, and we wanted an internal look and a validation at what the efficiency and power efficiency would be,” said Hazra.
“We believe multiple OEMs will be choosing Xeon and Xeon Phi to build number one systems in the future.”
Hazra added that Bluegene/Q is a vertical architecture, meaning it is built completely by IBM, and hardware, software and applications have to be ported to it. He said that Intel believes in open standards, and that could mean a trade-off between extreme energy efficiency and broad applicability.
“The best way to look at this is to ask, why are x86-based machines 78% of the new listing?”
He said that the familiarity of the x86 programming model, the efficiency of the architecture and the amount of code that is available are what makes it the choice for so many companies, adding that “Xeon is the backbone of the Top500, and the broader HPC industry.”
The Intel Xeon Phi coprocessor is supported by 44 manufacturers including Bull, Cray, Dell, HP, IBM, Inspur and NEC. Cray also announced at ISC that its next-generation supercomputer, code-named Cascade, will run on Intel Xeon Phi coprocessors.