First Intel Xeon Phi-based supercomputer to launch 7 January

| Comments

Share

On 7 January 2013, the Texas Advanced Computing Center (TACC) at The University of Texas in Austin will deploy the Stampede supercomputer, which is the first large-scale deployment of Intel's Many Integrated Core (MIC) technology in the world.

Stampede is the result of a $51.5 million grant from the US National Science Foundation (NSF), and is intended to support the nation's scientists in addressing the most challenging scientific and engineering problems over four years.

The new system, built by TACC in partnership with Dell, Intel and Mellanox, will have a peak performance of more than 2 petaflops from the base cluster of Intel Xeon processors and more than 7 petaflops from the Intel Xeon Phi coprocessors (based on the MIC architecture).

Altogether, Stampede will have a peak performance of 10 petaflops, 272 terabytes of memory, and 14 petabytes of disk storage. It will therefore be the most powerful system in the NSF's eXtreme Digital programme and the seventh most powerful supercomputer in the world.

At the Dell World conference in Austin last week, Techworld was given a tour of Stampede as it was prepared for deployment at TACC. First impressions are, it's very big, very loud, very windy, and you can almost feel the power when you walk into the room.

When completed, Stampede will comprise 6,400 Dell PowerEdge C8220X “Zeus” servers, with each server having dual 8-core processors from the Intel Xeon processor E5 family and at least one Intel Xeon Phi processor (in some cases two).

Additionally, Stampede will offer 128 next-generation NVIDIA graphics processing units (GPUs) for remote visualisation, 16 Dell servers with 1 terabyte of shared memory and 2 GPUs each for large data analysis, and a high-performance Lustre file system for data-intensive computing.

“Computers have become the most important general purpose instrument of science. The computational techniques complement theory and observation in every field of science and the percentage of research that's using computing continues to grow,” said TACC Director Jay Boisseau.

“In fact, the percentage of research that uses supercomputing as a competitive advantage as well as a scientific capability continues to grow.”

He added that, as long as the universe is governed by fundamental equations, the IT industry will be helping scientists predict what will happen in various scenarios and various physical processes through modelling and simulation.

To find out more about Stampede, check out the slideshow below:

Next Prev
Next Prev yah

'Poweful beyond imagination'

The mission of the Texas Advanced Computing Center (TACC) is to enable discoveries that advance science and society through the application of advanced computing technologies. According to Boisseau, the importance of TACC's work can be summed up by a quote that has been attributed at Albert Einstein: “Computers are incredibly fast, accurate and stupid; humans are incredibly slow, inaccurate and brilliant; together they are powerful beyond imagination.”

Next Prev yah

Complex problems need powerful computers

The world is not just getting increasing computing power but it is getting vastly more digital data. Some of this is generated by computers but much of it is generated by other digital devices, such as sensors and imaging devices. Modern science and engineering is therefore as much about about managing and analysing data as modelling and simulation, according to Boisseau. Very powerful computers are needed to address the scale and speed of these problems.

Next Prev yah

Stampede is the seventh most powerful supercomputer in the world

As of 7 January, TACC will be getting a new addition to its supercomputing portfolio. Stampede packs 6,400 dual socket nodes - over 2,000 more than Ranger (currently 50th in the world) - and is the first large scale deployment of Intel MIC technology in the world. Boisseau said that, because it is parallel computing, you cannot afford to just swap out nodes when something fails. How you design your environment and how you balance your capabilities can therefore be the difference between whether a user likes computing at TACC or not.

Next Prev yah

40 nodes per rack, 182 racks

Each rack contains 40 compute nodes, as well as some network switch gear in the middle and the top of the rack. If the node has a blue light on it, that means there is a working Xeon Phi processor inside it. Each node has at least one Xeon Phi, and between 400 and 500 of them will have two Xeon Phis in them. There will be between 6,800 and 6,900 Xeon Phis in the entire system, as well as 120 GPUs. That leaves almost 6,000 empty slots, which provides a fair bit of room for expansion.

Next Prev yah

Weighing up reliability vs capability

The demand for these high-performance systems is much higher than TACC can accommodate, so it has to pare down the requests to the number of cycles it has available. Assuming 24/7 operation, and 4 percent downtime, the level of utilisation is around 90 percent. With regard to the 4 percent downtime, Boisseau said that if you spend all your financial resources ensuring greater reliability you shrink your compute capability, so TACC is willing to sacrifice a few percent of uptime in return for a much greater scientific capability.

Next Prev yah

Hot-aisle containment

TACC has implemented a design using hot-aisle containment and in-row cooling units. It is extremely windy and noisy inside the data centre, meaning that most of the engineers wear ear plugs. The power distribution brings 415V to the cabinet and 240V to the servers. Between the Stampede facility and the 4,000 square foot data hall housing Ranger, TACC has a power capacity of approximately 10 megawatts.

Next Prev yah

4,720 Xeon Phi processors installed so far

TACC is still installing Xeon Phi processors into the Dell C8220X servers. At the time of visiting, there were 4,720 installed, and engineers were working their way through the system. However, they will have to take a pause while the non-MIC part of the cluster goes through acceptance testing. The last Xeon Phis should be installed by early January.

Next Prev yah

More performace work than a GPU

Although it is relatively quick and easy to port code to a Xeon Phi processor, optimising performance still takes some work. Unlike a GPU, where just rearranging the code for the port can result in 80 percent of the performance work getting done too, users of Stampede will need to liaise with people at Dell, Intel and TACC to get their codes performing well.

Next Prev yah

Overhead networking, under-floor cooling

Power and most of the major networking for Stampede is overhead. This is because, although the data centre has a raised floor, the raised floor is full of water pipes to supply the cooling systems. TACC worked with Dell to cable the systems very cleanly and allow engineers to work on them easily.

Next Prev yah

All-fibre optic networking between racks

All components – compute nodes, visualisation nodes, large shared memory nodes, and file system – will be integrated with an FDR 56Gbps InfiniBand network for extreme scalability. Stampede will be the first system to have all-fibre optic networking between racks. The in-rack networking is copper, because the distances it has to travel are so large that this is the only viable option.

1
/12

Share

Comments

Latest UK Updated 12:58pm

We use cookies to provide you with a better experience. If you continue to use this site, we'll assume you're happy with this. Alternatively, click here to find out how to manage these cookies

hide cookie message