At this moment of change when the trade-off between heat and power is getting harder and harder to resolve, when clock speeds have started to matter less than chip design, and when untried new technologies will be needed if Moore's Law is not to collapse, we asked Steve Pawlowski in this exclusive interview whether Intel can keep the engine running at the breakneck speeds of the last 30 years and, if so, how.
Please tell us what keeps you at Intel, and what you do there.
What keeps me at Intel? Two things: The job is fun and remains fun. I still enjoy getting up and going to work everyday. Secondly, Intel is a place of opportunity. I've been given the opportunity to do board design, system design, chipset design, CPU design, created a wireless research group. I've been given the privilege to head our corporate micro-processor research group, where we are working on designs and technologies that are five to 10 years out, and I work with some of the best and brightest people at Intel and externally. There are few companies that can offer this breadth of opportunities and I'm fortunate to be at one of them.
The Intel architecture has undergone considerable changes since the original 8086. How much of that design remains in Intel's latest designs?
We strongly maintain instruction compatibility between current generations and previous generations of micro-processors. Though the exact structures may not be the same, with some exceptions - such as the number of architectural registers - the functionality is the same. There are several issues with 'legacy' at the platform that we evaluate for removal with each generation. Very rarely does one of these features get removed unless we are absolutely (100%) confident that there will not be any compatibility issues with its removal.
Intel has always emphasised backwards x86 compatibility. Is this as important as it was - and how long is it likely to remain so?
ISA compatibility is still very important because of the installed software base that is out there that is written in x86 binaries. It's important to maintain that compatibility and to show performance improvement from one generation to the next. It will remain so as long as there is software that depends on this compatibility.
Intel has moved from clock speed-driven marketing to selling features such as longer pipelines, hyper-threading and multi-core processors. The move was essential according to Intel's chief technology officer Pat Gelsinger who said at IDF: "Power, memory, RC delays and other effects are going to curb the rate of frequency growth into the future."
How does this affect the chip R&D effort from the perspective of providing the marketing team with features they can take to market? Does it mean, for instance, that engineers no longer have as easily definable a target to aim for?
Actually, as far as an easily definable target to aim for, I've observed the opposite. The variables constraining the design are increasing; however, we have many features that we could add, and it's just determining which are of significant and marketable value and we have processes in place to provide this guidance. There are also features that we add that really aren't that visible to the user, although they improve performance, reliability, etc.
What we believe is that the computing demands for a next generation of workloads and applications, as Pat described in his IDF keynote, will only increase and at a fairly steep rate. So, our challenge is that with the increased workloads and the effects we are seeing, we as architects are going to have to get creative and use the transistors we are given much more efficiently and to increase performance in other ways. It's an exciting challenge and one that, depending on the solution, will have ramifications for the software, which includes the development and debug tools and possibly how we teach computer science to the next generation of programmers and engineers.
To what extent are multi-core processors going to be the main thrust of Intel's future development in the foreseeable future?
It's one vector that we are pursuing but any significant change is not going to occur overnight. Having multi-cores doesn't mean that the software is going to be able to utilise all the cores right out of the chute. We still need to balance single thread performance and power efficiency, and look at the application of multi-core architectures going forward. We need to understand the workloads where we think multi-core performance will become significant. There is a lot of research to be done on the core capability, the interconnect, communication model and programming model. It will take time and a lot of work.
Given the difficulties not just in designing multi-processor architectures but, in a way more critically, designing software to fully exploit such systems, to what extent does the software development effort lag behind processor architectures? If this is the case, what can be done to fix it?
Traditionally, if we look for example at the migration from 16-bit to 32-bit architectures, the lag was in the order of five years or more. A new architecture needs to be comprehended and developed with the programming issues in mind. Understanding the programming model, the debug environment, the programming language support, impact to the OS, the platform requirements (memory bandwidth/latency, event recognition/servicing requirements, synchronisation, memory coherency, etc.), all need to be in the forefront of the architecture discussion and the research/development of these capabilities has to begin much earlier than when the silicon and platform are available. I would anticipate that software support would be a large part of any multi-core architecture R&D investment.
Can you foresee a fix for the problem of memory latency, in terms of reducing it rather than using, for example, helper threads to make use of the waiting time?
The latency at the memory device may not change too drastically in the foreseeable future. One of the things that could be done would be to improve the I/O bandwidth of the device to match more closely the internal bandwidth of the DRAM core, but the raw access time of the core will probably remain roughly equivalent to what it is today.
There are other solutions to reducing the impact of latency. Helper threads using profile-guided optimisations are one example that you cited, other pre-fetching methodologies and/or techniques for reducing branch penalties will always be studied; however, I think the energy efficiency used to support traditional latency reduction techniques may play a significant role in how these are implemented in future architectures. What that solution(s) is/are remain to be seen.
How are changing datasets changing the future of chip design?
We will always have to deal with bandwidth and getting more and more effective bandwidth for moving information into and out of our devices. The increasing data set size will have an effect on the memory and I/O subsystems in a significant way and we will have to be able to move these data elements at ever faster rates.
Copper interconnects are presenting some challenges in meeting future bandwidth needs. There's still some headroom but it's moved way beyond the point where you could just lay a trace down on a motherboard and expect that you could transfer data where only one in 1024 bits is in error. The error rates are going up, and these will be compensated for by communications processing technology to get the levels of reliability that we are accustomed to, or we will understand how to live with the larger error rates. When the cost of transferring data using the dominant technology - that is, copper - becomes greater than another technology, then, at that time, you will probably see a change.
Have you any news/updates on the issue of optical interconnects?
We have research going on in this area on several fronts. Earlier this year Intel disclosed that it had designed an optical I/O link with high bandwidth that can be used for chip-to-chip interconnect. As CPU speeds increase over time, system bus speeds, such as those between the CPU and memory, must also increase.
Optical I/O is being investigated for possible use in the first half of the next decade to interconnect chips in a cost-effective, power-efficient manner. You can read the paper on this technology that was presented at the Photonics West 2004 Conference on January 29 here. We also published an article in Nature magazine this past February on a silicon-based modulator that operated at frequencies over 1GHz, and we believe that there is still plenty of frequency headroom. Intel's research work in these and possibly other areas is continuing with the goal of being ready when the copper transition needs to happen at the box, board and chip levels.
Do you foresee the abolition of wiring inside the PC/server chassis based on its replacement by wireless interconnects and, if so, when?
By wireless, I assume that you mean RF, not copper or optical. My opinion is that wireless will never replace optical or copper for inside the box connectivity. The channel is very noisy, it's not well contained, like copper or fibre, and in order to get the data rates that we will require, we will need a great deal of bandwidth. On top of this, the bit error rate is nearly six orders of magnitude worse than either copper or optical channels.
I've been asked this question by people in industry and academia from around the world and I just don't hold out a lot of hope that it will be a viable solution for internal high speed interconnects.
About 30 percent of the total power consumed by Intel's newest Pentium 4 processor is wasted as current leakage, according to Intel's manufacturing VP Joseph Schutz. Can we foresee any breakthroughs in the vexed issue of processor power consumption in the medium term? If so, can you give us any ideas what technology might be involved?
There are several design technologies to reduce leakage power during idle periods. Substrate biasing, removing the bias on the idle logic, etc. We are investigating technologies such as these to reduce leakage power.
For active power reduction there are several types of techniques that we've used to reduce overall power consumption. Again, there are engineering solutions but implementing them will have to be made by the engineering team for the individual products and their market segment.
How closely do you watch what AMD and others are up to?
I don't have much insight into this from a product perspective, as I am in our R&D organisation. From that perspective, we constantly benchmark the computing (including micro-processor companies) and academic research to evaluate our efforts and line of thinking. Some of these metrics have greater weight than others, it depends on the particular area of interest.
Finally, Moore's Law - some might argue it's not a law but more a prediction that has given companies such as Intel a guiding path for the development effort - a self-fulfilling prophesy, if you like. Had Moore said triple the number of transistors rather than double, could Intel have made that happen?
Moore's law was created by observing the rate at which transistor density increased per unit time and extrapolated those phenomena into the future. Had this been triple rather than double, then we would have probably seen this trend in industry.
Is it self-fulfilling? Not from what I've heard from our process manufacturing engineers. The fact that Moore's Law continues today and in the foreseeable future, enables us to architect and design components that can take advantage of the increased number of transistors for much improved capability, reliability and performance. This has had a tremendous impact on society. The increase in performance and capability at a lower cost, on a regular cadence, has created new uses and applications that one could not have envisioned when Moore's Law was postulated.
What always impresses me is that this 'prediction' in device density was made several decades ago! Not too many predictions in this industry have withstood the test of time like that.
There's more about Steve here.