All networks undergo changes on a pretty frequent basis. Yet changes are the biggest cause of network outages we experience. Why haven't we got it right yet? Planning and assessing changes is the most important part of any change process - and the part we tend to do least rigorously. But if you just throw a change in, it's not that easy to back it out if there's a problem. Here are some ideas on how to manage your changes in a way that will give you fewer headaches.
Plans and schedules
Nobody can help having emergency change requirements, that need doing quickly to fix a problem, but if a change is due to a project, then it shouldn't be that difficult to schedule, in advance. There should be an approval process that lets everyone see what changes are going to be done when, what they entail and what they will affect - both during the change window and once complete. That lets people see if there is any conflict with any other work being carried out at the same time. A formal process to make sure everyone involved has read and approved the change makes sure nobody can claim they didn't know. But adding too many people to the approval process, just for the sake of it, makes the whole process too unwieldy to be effective. People will start to sneak in changes outside the process.
It's never a good idea to try to do too many changes at once, so how about setting aside certain days of the week for different types of changes? Network changes one evening, server ones another, and mainframes on yet another night? Apart from making it easier to track down the culprit if something doesn't work the next day, it also lets the people making the changes run their lives a bit better if they know that they can safely plan to go out on a Thursday because that's the day the desktop guys do their stuff, so they'll not be called on to work late on a network change. Considering the network staff, as well as the equipment, is a novel approach for some companies, but it's a nice touch.
As well as detailing what changes you're going to make, it's important to identify possible side effects or problems that might arise. If you're upgrading a router, for instance, even if it's been tested in the lab, what if the upgrade fails? If the code's corrupted? If it's on a remote router and the WAN fails half-way through? You can't think of everything, but try to cater for the most likely scenarios.
Then you need to decide the recovery actions - the last thing you want is to be trying to think of a way out of your predicament at three in the morning when you're tired and stressed.
And even if it all goes according to plan, if there's a problem the next day when the users get in - or even if the change just didn't work - what's your backout plan? If you've overwritten the old code with new code, where's the copy of the old version and is it ready to download? Having to back out a change isn't an uncommon occurrence and it needn't be a major hassle either.
Never, ever, make a change without it being documented. And that means beforehand, not six weeks after the event. Of course documentation may need updating once the change is complete but if you don't know enough about what the change entails before you start to write it down, you shouldn't be going anywhere near the kit. No matter how familiar you are with the equipment configuration, you shouldn't be doing it on the fly.
Detail what you are going to do and the configuration changes you need to make. Ideally, this should be included in the change approval documentation so everyone knows exactly what you're doing - you never know when someone might point out that an IP address you're about to allocate to a new router interface has actually already been used and nobody told you.
Apart from anything else, it lets a colleague sanity check your config for typos - even techies make them - and when you come to make the change, you can just copy and paste the new config in, rather than typing from scratch.
Topology diagrams must be updated at the time - even if it's drawn by hand until you get the proper copy done later. There is nothing more unfair than someone making a change and expecting network ops to support it the next day if they've not been given all the right information. As a matter of course, if you made the change, you should be onsite first thing the next day, just in case of problems. That isn't always possible, so make sure you don't leave your colleagues high and dry trying to support your work - because they'll get their own back one day if you do!