Back in January 2009, I published the first half of my groundbreaking study on multicore support under Microsoft Windows. The article featured an in-depth look at multicore/multiprocessor performance under Windows 7, Vista, and XP, including extensive benchmark data for each platform. At the time, I concluded that Windows 7, and to a lesser degree Vista, delivered better scalability moving from single-core to dual- and quad-core architectures. However, I also noted that this advantage was not yet sufficient to allow Windows 7 to overtake the leaner, more efficient XP under heavy workloads.

What a difference a year makes! After revisiting my earlier test scenarios using a newer, Nehalem-based workstation (the HP Z800 with dual quad-core Xeon 5500-series CPUs), I'm pleased to report that Windows 7 not only closes the gap with Windows XP, but blows right past it, delivering results that are 47 to 178 percent faster overall. Moreover, Windows 7 shows far superior scalability, by a factor of more than 3.5, when moving from a single quad-core CPU (Core 2 Duo Extreme QX9300) to the dual quad-core, Hyper-Threading Xeons in our newer Z800 test bed.

In my earlier article, I posited that, as multicore PCs evolve and the number of cores increases, the superior scalability of the Windows 7 kernel would eventually overcome Windows XP in terms of raw application throughput. But I had figured this inflection point to be well into the many-core future and suggested we would be lucky to see Windows 7 overtaking XP before 16- or 32-core CPUs were commonplace. It's now clear that my prediction was off by a factor of 3 or 4, and that the point where a combination of multicore hardware and kernel tuning wins out over the simpler, brute-force approach of the XP kernel has already been reached.

Simply put, Windows 7 is significantly faster than Windows XP when running heavy, multitasking workloads on advanced, multicore hardware. And when considered in light of current trends in PC hardware design and multicore road maps, this advantage should be enough to sway even the most ardent fence sitters to finally jump on the Windows 7 bandwagon.

Factor this
Several factors conspire to give Windows 7 the edge on multicore. For example, the introduction of Non-Uniform Memory Access (NUMA)-based multiprocessor systems, like the HP Z800, are allowing for greater compute engine density in a commodity form factor. By combining multiple cores per CPU with multiple CPU sockets, PC vendors can deliver levels of scalability previously reserved for high-end servers, and they can do so at price points that would have been impossible to achieve using traditional, discrete processors.

Another factor is the transition away from the Front Side Bus architecture that has been a staple of Intel-based PC and workstation designs for years. In its place, Quick Path Interconnect (QPI), Intel's answer to AMD's HyperTransport, places a memory controller on the same die as the CPU, allowing the latter to directly access physical memory. The net result is much faster access to memory that is local to a particular CPU core and, when combined with a Level 3 cache, improved performance for juggling workloads across multiple CPUs.

Together, the NUMA and QPI advancements have served to drive the Intel architecture forward. However, they would both be for naught without support from the OS. Which is why the extensive multicore tuning that went into the Windows 7 kernel is so important. Without it, Windows users would be unable to leverage the performance-enhancing features of the latest Intel (and AMD) CPUs. In other words, to make the most of today's smarter CPUs, you need a smarter OS.