If you want something done well, do it yourself. That holds equally true for machines as it does for other people. Just because your network equipment is theoretically capable of operating with sometimes minimal input from you, doesn’t mean that you should just let it do what it wants. Unless of course you like troubleshooting a non-deterministic shambles of a network that takes forever to converge for no good reason every time the wind changes.

Which was exactly the kind of network I had to deal with recently when asked to go and have a look at why a large UK insurance company was suffering from frequent and prolonged outages in its data centres.

The basic design was fine—core, aggregation, access. Layer 2 in the access layer, and trunks interconnecting the access switches up to the aggregation ones. Except that there were approximately 95 VLANs in the data centre—and every one was configured on every switch. And not only that, but since no VLAN pruning had been done on any of the uplinks, every VLAN was carried over every trunk. And they were running per-VLAN Spanning Tree (nothing wrong with that, except it should have been Rapid PVST), so there was a Spanning Tree instance for every VLAN on every switch. Whether the switch had any active ports in that VLAN or not.

So every time there was a glitch on the network, every switch was involved in the Spanning Tree reconvergence for that VLAN. And if there was a glitch on a trunk link, well, that affected every VLAN, so each poor switch (and some of the access switches weren’t exactly what you might call performance-rich) had to recalculate Spanning Tree for each of the 95 VLANs. Just think about the CPU hit for that. And they wondered why they had problems!