There's an old joke about the quality assurance department of a pipe manufacturing company.

They take a sample of pipe from a batch the've made; they X-ray it, do metallurgical tests, run pressure tests on it and have the welds inspected by Lloyd's of London. All the tests are passed with flying colours. Then just as they're stacking the pipes on the truck for delivery to the customer, the apprentice points out that the diameter of the pipe is three inches, and it should be six. (See - the joke's so old, it uses imperial units).

A former colleague of mine did something similar the other week. She was responsible for testing on the part of the project I was working on at the time, and we had a boatload of different test scripts that had to be run on processes that involved the components we were developing. Since our bit of the project was the document generation process, which used email where possible but still had to print boatloads of letters to non-email-savvy recipients, we generally ran things overnight and there'd be a rain forest waiting for us on the printer in the morning.

One morning Isabel came in, sat down, looked at us and said she thought there might be a problem. What made her think this? "Oh, it looks funny", she said, referring to one of the piles of documents (the content of the printer's output bin had by this time been sorted into their various categories).

We checked, and there was no fault; everything had done what it was meant to do. The upstream system had queued various quantities of each document for production, and the correct documents had been produced.

So nothing was wrong ... but actually she was right. What had subconsciously registered was that there seemed rather a lot (actually, an absolute shedload) of one particular type of letter.

Yes, our software had produced the right number of letters, and the upstream software had produced the right letters based on what it had been configured to do. But the diagnosis of "It looks funny", once we'd looked into it, was that the specification for the system config had been done without realising just how many of this type of letter would be printed each night. They can now look at tweaking the config so that it's a little less eager to generate quite so many (the letter in question is an automated "we're chasing you for X letter", so they can simply set the system to chase less vigorously).

The moral of the story is this: while you can test processes and accuracy using well-defined scripts and test methodologies, there's no substitute for having someone on hand who (a) understands the whole system, not just the bit of it they're working on; and (b) can inject the project with a healthy dose of common sense.