Users don’t care how they connect to their data—just as long as they do. It’s up to you to make sure that regardless of individual component failures, there is always a valid link.
There are several high-level issues you have to take care of to ensure that your network design is best suited to provide maximum availability to your servers, such as power supplies, software concurrency and change management. But at a purely technical level, how do ensure that your network will be up whenever it’s required?
If you’ve ever been subjected to a network design presentation you’ll be familiar with the modular, hierarchical model recommended these days. And it’s a good one. Different from some of the topologies we saw when ATM LANE was (briefly) in favour), but the building block approach lends itself well to scalability and manageability issues.
And the server farm is just another module. With some specific requirements, it’s true, but a good starting point is to follow the guidelines for any campus design, and build the extra features on that baseline.
In any modular campus LAN design, you should be planning a core, distribution (which is often integrated with the core, particularly in smaller network designs) and access layer. High speed uplinks from the access towards the core, which also has interconnections to the WAN and other services, including the server farms.
Ideally you want to aim for layer 3 routed links between the layers or modules. There is no longer any need to try to avoid inter-VLAN routing, and providing a layer 3 boundary between building blocks provides greater control, allowing you if not to remove Spanning Tree, at least keep it within access layer blocks, so that any topology changes or instability don’t affect your whole network.
Campus-wide VLANs should therefore be designed out, although this can be tricky where you have applications that can’t cope with having to be routed, or you need to provide consistent IP addressing for users, despite their location. However there are ways round this using overlay networks and VPN connections.
Since in reality you’ll probably end up with a compromise—typically layer 2 from access switches to distribution, then layer 3 to core, make sure you map your L2 to L3 scenarios. Make the subnet addresses resemble VLAN IDs—okay, it’s not essential, but remember that it’s people that have to manage and support these networks, not computers.
Turn on passive interfaces where you don’t need routing updates, prune back VLANs from trunks where they don’t need to go, and tune the Spanning Tree configuration you do have, always making sure you know where the root bridge will be, even under failure situations, and speeding up reconvergence by applying Rapid Spanning Tree (or uplink fast, backbone fast etc in Cisco parlance).
Server farm specifics
So what’s special about the server farm module design? Removing single points of a failure is the focus. Multi-home your servers for optimal uptime. Where servers are critical, you’ll have to have multiples performing the same function—and DNS round robin is not the way to do this. Watch for the next article on server load balancing and content switching.
There are a variety of options when it comes to multi-homing. At layer two, you can get just fault tolerance, where one NIC is active and another(s) in standby, from all NIC vendors, but there are some other options.
Then there’s adaptive load balancing, where all NIC ports can transmit data, but only one will receive it. Each NIC port will have it’s own MAC address, and only one will respond to ARP queries for the servers IP address—if it fails, the next will take over—and it will do it using the failed ports MAC, so that no ARP timeouts need to be waited for.
For full load balancing, you need an understanding between the switch and the NICs that the same address appears on multiple ports. The likes of Cisco’s EtherChannel, or Sun’s Trunking, being examples. You’ll need to test this thoroughly, and there are gotchas, such as if the load balancing at either end of the link uses the MAC addresses to balance, it won’t really be that effective, since source and destination addresses are the same for all traffic on that logical link.
On another topic, be aware that the oversubscription on interswitch links is completely different. Where you can typically get away with about a 20 to one ratio on access switch uplinks if it’s user connections you’re dealing with, on server farm switches anything from four to one to one to one is the norm.
You may also start looking at Quality of Service here—watch out for another article focussed on QoS soon. If you do, one thing to be aware of, particularly if you implement WRED to avoid tail dropping if queues start to fill up. There’s more to it than setting your switch and router parameters. Properly setting the TCP Window size on your servers can in cases double throughput numbers, and you can see improvements even without any QoS configured, so have a look at increasing your default settings.