A while back I did a fairly major upgrade. Because systems had to be down it had to be done overnight; not a big deal, it's a perfectly normal thing to do as a techie. My colleagues and I spent the thick end of 24 hours running through the process, and after almost bankrupting the data centre company with the sheer amount of their complimentary coffee that was consumed we were able to sit back and reflect on a job well done.

On the face of it, this was a successful process. We did a major change covering a lot of sites, and everything turned out as it should have.

One can't help thinking, though, whether the story would have been the same had we hit a big snag. One of the final parts of the process was to make a "go/no-go" decision - that is, to decide whether to leave the new service live or roll back to the old systems. As it was, the answer was "go". But if it had been "no-go", we'd have had to roll back to the old setup. Now, this would have been a significantly faster process than the deployment because of the ability to revert relatively quickly to the saved copies of previous settings. But it would have taken us some hours, on top of the knackering time we'd already spend in noisy, stressful surroundings.

Would we have been OK? Yes, I'm sure we would. Was it optimal? No, probably not. Next time I do this kind of exercise I'm going to allow more contingency in terms of having additional people on the ground, perhaps to do the less specialist tasks while the senior guys get some shut-eye.

The moral of the story: never assume that everything will go right. Sometimes it won't, and you need to be sure to be ready.