Among IBMs conspicuous successes is its Power processor, invented in its northern US outpost, the labs in thermally challenged Rochester, Minneapolis -- also the birthplace of the AS/400. The chips in many of its servers and powered the Apple Mac until recently. Designed to work in both parallel and uni-processor designs, there are 64,000 of them in Blue Gene, the most powerful supercomputer in the world.
We recently travelled to Rochester and spoke to Frank Soltis, chief scientist at IBMs iSeries server division. Soltis was a key member of the team that invented the Power chip, and spoke to us about the chips birth, the thinking behind its design, and something of what the future holds.
Q: What were the original design criteria for the Power chip? Was it ever envisaged that it would find a home in massively parallel supercomputers and desktop workstations?
A: The original Power design was for a single-user Unix workstation. It was introduced in 1990 as the IBM RS/6000. That original RS/6000 used a 32-bit multi-chip implementation of the Power architecture. In 1991 Apple, Motorola and IBM teamed up to create a single-chip implementation of Power. They simplified some parts of the architecture to enable a single-chip implementation and called it PowerPC. PowerPC was still intended to be used in a single-user workstation.
In 1990 I was involved with the design of a new processor for the AS/400. The AS/400 was a multi-user server, and so we were designing a processor specifically for use in a server. A server processor is generally characterized as one that has multiple processors, where each processor performs many operations in parallel. Such a design must be capable of moving lots of data to support all of those parallel operations. We were designing a high performance, 64-bit processor that could be used in the AS/400 for the next ten years.
Jack Kuehler, IBM's president at the time, heard about our new processor design for the AS/400. Kuehler was the one who had negotiated the partnership with Motorola and Apple for PowerPC. He felt that the Power design represented the future for all IBM processors. He came to Rochester to convince us to use a Power processor for our next generation of AS/400s.
After some discussion, it became clear to Mr. Kuehler that Power was a workstation design that would not work very well in a server such as the AS/400. As a result of these discussions, Mr. Kuehler asked me to lead a team from across all of IBM to redesign Power to be a server architecture. That effort began in early 1991.
We expanded Power to be a full 64-bit architecture and added a great deal of functionality that was needed for a server processor. We also added support for multiprocessors (SMP) and even for massively parallel processing. There were some on the team who didn't think massively parallel processing was something we should consider, but this was time to incorporate those features into the architecture for future use. The result of this work was the Power architecture we use today.
Because we were able to think beyond just the immediate use for the Power processors, we were able to create a design that would be equally at home in a supercomputer, a commercial server and even a desktop workstation. We also designed the architecture to be highly expandable so that new features can continuously be added as time goes on.
Q: Can you explain more about the potential for expandability? For instance, is this focused around manufacturability, is it to do with leaving hooks of some kind into the core of the CPU, the capability to talk to more processors and become more massively parallel, or what?
A: We need to separate expandability of a particular chip from expandability of the architecture. We often build into a particular chip some capabilities that may not be used until the next release of the operating system. It may be as simple as the ability to talk to more processors, or it may some entirely new function that is not yet supported by the current software.
For example, we began building in the hardware support for logical partitioning long before the software was in place to support this type of partitioning. Expandability on the chip level is fairly easy to do. Architecture expandability is harder. Here we are trying to build in expandability for something that we will need in the future but don't yet know what it is.
There are various techniques that can be used to provide this level of expandability. One of the most popular techniques is to leave hooks for undefined instructions and undefined data types in the architecture. Another technique is define a standard interface that can be used by future designers to add new functions. Power uses all these techniques to future-proof the architecture.
Q: In retrospect, what would you have done differently at Power's design stage?
A: It is always difficult to identify something that should have been done differently. One of the best things we did do was to make the architecture highly expandable. Over the years we have continued to add functionality that we could not have envisioned back in the early 1990s. This expandability has also allowed IBM to open the Power architecture to the research and education communities through efforts such as Power.org. We are already seeing many others joining with us to expand Power for uses we never even dreamed of.
Q: Which areas of the Power chip's design are you keenest to improve?
A: Heat is the biggest problem with chips today. As we continue to shrink the semiconductor technologies and add more transistors on the chips, the amount of heat produced is increasing rapidly. We have already added features to our latest Power5 and Power5+ chips to reduce the heat build up, for example turning off parts of the chip when not in use. We are off to a good start, but there is more to be done. Finding new ways to reduce the heat produced by the chip is a huge focus area for all of us in this industry today.
Q: Power is set up to be a parallel chip but software development lags behind the hardware in this instance. What needs to be done to exploit the parallel nature of the chips of tomorrow -- and why haven't we already done it? Is developing parallel applications just too difficult?
A: Developing parallel applications is hard. By some estimates, it takes two to three times the effort to develop a multithreading parallel application than it does to develop a single-threaded application. In the past a programmer could count on the speed of the processors to continually increase. If the application didn't perform as well as it should, it was a simple matter to upgrade the hardware to a faster processor.
This approach doesn't work any more. Processor speeds are no longer increasing rapidly and all processor vendors are moving to more parallelism with multi-core chips and other techniques. Writing parallel applications will in the future be the only way to get increased performance. There are some tools that can help, but much of the effort will come back to the way in which programmers design their applications.
Q: Do you see any straws in the wind suggesting that software engineers are further along the road towards developing either tools or apps that can help resolve this issue?
A: We began about ten years ago in IBM to make tools and compilers available that enable software engineers to more easily write parallel applications. Today, most ISV-developed applications for the System i5 are multithreaded parallel applications. This situation is not true for many other types of servers. For example, most applications written for PC servers are single threaded, meaning they can only use one core in a multi-core chip. This situation came about because most PC servers run only one application at a time. System i5 and its predecessors have always been designed to run multiple applications in parallel.
Q: You've said that the Alpha was the best chip there was. What was it about that design which worked best -- and what worked worst?
A: Alpha was the first of the so-called "speed demons." There are fundamentally two ways to design a processor today. The speed demon approach implements a given function using lots of very small pieces. In a single processor cycle only very simple operations are performed. This means that each cycle can run very fast, but it takes lots of cycles to implement the entire function. This type of processor design is characterised as having very long pipelines.
The other approach is called a "brainiac" design. With this design each processor cycle performs more complex operations. Each cycle takes longer to execute, but there are fewer cycles required to implement an entire function. This type of processor has fairly short pipelines.
IBM, HP and others chose brainiac designs for their processors in the early 1990s. Digital chose a speed demon design for Alpha. With all of the parallelism that is built into today's processor chips, nearly everyone has made the switch to a speed demon design. Our Power4, Power5 and Power5+ chips are all speed demons.
Alpha won the battle for the processor design but lost the war because of software. The big problem with Alpha was the lack of compatibility with previous processor designs. That meant that software that ran on VAX had to be partially or totally rewritten to run on Alpha. That rewrite didn't happen, and Alpha is now history. It takes more than a good hardware design to be successful.
Q: Somewhat like Alpha, Intel's Itanium has been seen as a good design but has struggled to gain market traction? Why is this, in your view?
A: Itanium has the same problem that Alpha had. It is not software compatible with previous Intel processors.
Q: How do you see Sun's future in the CPU business -- and why does the company struggle?
A: This is a question that you should be asking Sun. Many of us do believe, however, that there will be fewer computer companies in the future designing their own proprietary processor chips. The cost is just too high. That's the reason we are making Power chips widely available outside of just using them in IBM products. High volumes are extremely important if you want to stay in the processor business.
Q: IBM is involved with Sony and Toshiba in the Cell processor. To what extent is the design of that chip a harbinger of future chips -- if at all?
A: Cell is an awesome chip. Its performance is most impressive. The design uses a single main processor and eight co-processors on a single chip. Although we will not see the current Cell chips used in our commercial servers, the design approach used in for Cell chip will play a big part in our future Power processor designs. Stay tuned.
Q: We're talking about specialised co-processors, so what applications do you see as ripe for the co-processor approach?
A: There are a number of software functions such as TCP/IP stack processing and database search algorithms that may someday be supported with specialized hardware. At the moment we are not discussing too much of our future plans outside of IBM. It is still a very competitive world.
Q: The concept of RISC, where processor speed compensates for a lack of on-chip complexity -- and so flexibility -- seems to have died with the end of the megaHertz wars. Will it, should it, can it return -- and if so, should we as users care?
A: The original definition of RISC dates back to 1982 when two professors from the University of California at Berkeley coined the name "reduced instruction set computer." That definition has evolved over the years to the point that no major manufacturer today, including IBM, builds a pure RISC processor. We have kept the name but changed the definition. That definition will continue to evolve as we go forward. From a user's perspective it really doesn't matter.
Q: Given that transistor gates are now, arguably, as small as they can usefully get, are we reaching a fundamental geometry limit or are there more tricks up the processor designer's sleeve?
A: Never underestimate the resourcefulness of processor designers.
Q: Can you elaborate?
A: Many of the techniques that are used to gain an advantage over competitors are treated as trade secrets. Our current Power5 and Power5+ chips contain numerous trade secrets. These are never publicly disclosed nor ever patented.
Q: Gordon Moore has been publicly sceptical about the potential for nanotech to replace silicon-based technology in the near future. Do you agree and, if so, why?
A: Gordon Moore is correct when he says that it will be a long time before we replace silicon technology. The problem is not finding a technology that could replace silicon, whether that's nanotechnology or something else. The problem is being able to manufacture high volumes of that technology.
Manufacturing techniques are and have always been the driver of progress in the semiconductor industry. It is often easy to create a new technology in the laboratory, but mass producing that technology is what takes the time.
One of my favourite examples is Silicon on Insulator (SOI). You may recall when IBM first introduced SOI for our chips a few years ago. It was announced as a brand new technology. What wasn't said is that we knew about SOI and the benefits it delivered 30 years earlier. It took a full 30 years to get this technology from the laboratory into the manufacturing process.
Silicon technology will be replaced, and there are now several technology candidates looking to do so. The ones that can be manufactured in high volumes are the ones that will succeed. Right now nanotechnology looks fairly good. We will just have to wait and see.