This server is old, short on disk space and unstable because it's been patched to a fare-thee-well, reports a pilot fish in charge of keeping it going. It's also the home of the company's file, print, mail and groupware services.
"So we decided on setting up a new machine next to the old one and migrating the services one by one, instead of upgrading the old machine's hardware," fish says.
"I created a project plan and cost estimates, everything was approved by the big guys, and we went to work."
Everything migrates without trouble until only the groupware remains on the old server. Fish dreads dealing with the corrupted document databases, sorting through and cleaning up the mess before moving the whole works to the new hardware.
Fortunately, a new version of the groupware is released, one with big improvements in the way migrations and rebuilds work. "So upgrading, though a bit costly, would have helped us reduce downtime and migrate the system more smoothly," says fish.
"But we could have lived without the upgrade, and that's exactly what we told management: 'We can go one way or the other, but we need to know which way to go soon.'"
And the big boss of IT tells fish, "Get me the costs and I'll think about it and let you know -- soon."
Two months pass. The old groupware server starts generating strange little errors. We need to get the data out of there fast, fish tells the big boss.
"Print out that order, I'll sign it right away," big boss tells him. But when fish returns promptly with the paperwork, big boss is in a meeting. Fish leaves the paperwork with big boss's admin.
And nothing happens. Over the next three months, fish regularly catches the big boss in hallways and after meetings to ask about the status of the order. Response is always the same: "Yes, I'll look into that again as soon as I return to my office."
Finally, the old server is clearly corrupting data. Fish knows he no longer has a choice; he informs the users, shuts down the system and starts the migration process.
A few minutes in, one hard disk in the RAID array starts dying. But there's no way for the old version of the groupware to interrupt the migration. Fish starts praying.
Then the RAID controller begins throwing parity errors. Logs show that inconsistent data is being moved. "We were forced to take the whole system off-line for an array rebuild the hard way," fish says. "We were estimating a downtime of one to two weeks. That was bad enough.
"But that was just when the big boss showed up to ask why he couldn't access his calendar. He carefully listened to my explanations and thought for a moment.
"Then he asked, 'Will it help you if I sign that order right away?'"