Recently I was helping out a big UK finance house with some Data Centre network upgrades and just generally having a look at how they’d set things up and if it could be improved.
On paper it looked not too bad. Nice hierarchical design, access switches dual-homed to the distribution layer, some big beefy core boxes. Some of the kit was getting a bit long in the tooth, but nothing too scary. It made a nice change to see kit that’s actually still supported by the manufacturer, to be honest.
Then we went for a trip into one of the Data Centres for a look round. That’s when things started to look just a bit flaky.
All the access switches were dual-homed to two distribution switches, yes. There were actually several pairs of distribution switches, since this was a pretty big setup. But each pair of distribution switches were physically installed in the same rack, one directly above the other.
Okay they had dual power supplies and were connected to separate feeds, so a power failure shouldn’t take out both at once. But when you have two switches acting as a redundant pair, each backing the other up, is it really a good idea to install them so close together?
I’m not so worried about some environmental problem taking out a rack—after all, if the overhead water pipes rupture, it’ll probably affect a whole row of racks anyway (cheery thought). But the scope for human error is suddenly much greater than it needs to be.
The thinking apparently was that it was easier doing the initial install and testing if everything was in one rack, and it made the cabling guys’ job easier too. But that’s worrying for a start since it implies that they ran all the cables in together—so if anything happens to one cable its likely it’ll affect the rest in the loom, which instantly isolates both switches at the same time.
And in the middle of the night, when you’re tired and stressed, which would you prefer? Constantly checking and rechecking to just make sure that, yes, it is the top switch you’re supposed to be pulling the supervisor on, not the bottom one, or relaxing in the knowledge that the backup switch is over at the over side of the Data Centre, and anything you do to this one can’t possibly mess it up and take out the whole service?
Maybe I’m just after an easy life, but I’d go for the second option every time. Having two devices that back each other up just so close to each other just adds an unnecessary level of risk. We’re all paranoid about single points of failure in our network designs—that should extend to the physical layer too.