Putting two or more processor cores on a single silicon chip has been one of the most important milestones in computing in recent years. It allows users to continue to reap the benefits of Moore's Law while sidestepping the extreme difficulty of manufacturing, powering and cooling single microprocessors beyond 4GHz.

Chip multi-processors (CMP) also offer the opportunity to significantly boost the performance of applications that are able to share them.

But the benefits of parallel processing don't come easily. Programmers have to behave differently, as do compilers, languages and operating systems. If application software is to reap the benefits of CMPs, new skills, techniques and tools for designing, coding and debugging will be needed. Fortunately, both hardware and software vendors are developing tools and methods to make the job easier.

"Multi-core chips are going to be a challenge for software developers and compiler writers," says Ken Kennedy, a computer science professor at Rice University in Houston, who specialises in software for parallel processing.

"If you look at chip makers' road maps, they are doubling cores every couple of years, sort of on a Moore's Law basis, and I'm worried we are not going to be able to keep up."

Desktop applications that traditionally have been written for one processor will increasingly be written to exploit the concurrency available in CMPs. Meanwhile, server applications that have for years been able to use multiple processors will be able to distribute their workloads more flexibly and efficiently. Virtualisation, another important trend in computing today, will be made easier by CMPs as well.

Keeping up with multi-core

Keeping up with CMPs is the focus of intense activity at a number of companies, including Microsoft. Researchers there who are developing CMP tools are focusing on two broad areas: how to find errors in code written for multiple processors, and how to make it easier to write reliable software in the first place.

"A lot of the techniques we have used with sequential code don't work as well, or at all, with parallel programs," says Jim Larus, manager of programming languages and tools at Microsoft Research. "In testing, you typically run your program with a lot of data, but with parallel programs, you could run your program 1,000 times with the same data and get the right answer, but on the 1,001st time, an error manifests itself."

This ugly trait results from "race" conditions in parallel code, in which one process is expected to finish in time to supply a result to another process -- and usually does. But because of some anomaly such as an operating system interrupt, occasionally it does not. Such bugs can be extremely hard to find because they are not readily reproducible.

The tools Larus' group is developing allow more controlled testing so a programmer can, for example, vary the timing of two threads to check for race errors. The tools will eventually be offered commercially as part of Visual Studio, Larus says, "but we have a long way to go."

New programming models

Microsoft Research is also trying the KISS -- or "keep it strictly sequential" -- model. KISS transforms a concurrent program into a sequential one that simulates the execution of the concurrent program. The sequential program can then be analysed and partially debugged by a conventional tool that only needs to understand the semantics of sequential execution.

Microsoft and others are also working on a new programming model called software transactional memory, or STM. It's a way to correctly synchronise memory operations without locking -- the traditional way to avoid timing errors -- so that problems such as deadlocking are avoided.

STM treats a memory access as part of a transaction, and if a timing conflict occurs with some other operation, the transaction is simply rolled back and tried again later, similar to the way today's database systems work.

"The idea is that the programmer, instead of specifying at a very low level how to do this synchronisation, basically says, 'All the code between this point in the program and this other point, I want to behave as if it were the only thing accessing data at this time. System, go make that happen,'" says Larus.

STM -- "a really hot research topic these days" -- may someday be implemented in a combination of hardware and software, says Larus. In the meantime, programmers will have to use fine-grained locking -- in which individual rows or elements of a table are locked, rather than the whole table -- to ensure correct synchronisation in parallel programs. The more parallel threads there are, the more difficult that becomes.

Microsoft products won't require significant changes to scale from two processors (or processor cores) to four or eight processors, other than perhaps some performance tuning, according to Larus.

"But when you start getting to bigger-scale machines, the question becomes, What are the bottlenecks?" he says. "If you have more processors, you have to have increasingly fine-grained locking."

This is part 1 of a two-part article. Read Part 2 here.