You know you're in trouble if your router fails and your users can't get to their servers, or out onto the Internet. So you install two, for redundancy. But how do you make sure the second one will take over seamlessly when it needs to, and can you load balance between them, for better performance?
Unsurprisingly, there are protocols you can configure your routers to support that do let you do this, but, as with most things in networking, you have choices to make and tweaks to configurations to put in to ensure everything works as you expect in your environment.
The standards-based router redundancy protocol is VRRP, the Virtual Router Redundancy Protocol. The first VRRP RFC (2338) written in 1998, has recently been superseded by RFC3768. Interestingly, the only real change between the two is the removal of authentication methods, as it was deemed that "these did not provide any real measure of security", according to the RFC, so it's up to you to make sure your network's secure.
VRRP is pretty much the same as Cisco's proprietary HSRP (Hot Standby Routing Protocol), which isn't surprising since it's based on it, with some terminology changes. In both cases you have one master (active) router and one or more backup (standby) routers. All routers have unique IP addresses for their interfaces onto the LAN, but it's a separate address that's advertised to hosts. The master router will reply to ARPs for that address and forward any traffic sent to it, while the others will just sit in the background monitoring what's going on, ready to assume the VRRP/HSRP address (both MAC and IP) should the master fail, so the end stations don't need to know anything abut a replacement coming online.
Which is all fine, but this is where you need to start configuring things to make sure it all behaves as you want. Left alone, the routers will elect their own master - unless you want the joy of an independent network, it's best you configure router priorities so that you know exactly which one will take the traffic, both in normal and failure scenarios.
For anything but the smallest networks, you should also design things such that different routers are master for different groups of hosts, otherwise you'll have one router working flat out and the others idling. The typical design is similar to Spanning Tree load balancing, with odd-numbered VLANs having a default gateway address for one VRRP group, and the even-numbered VLANs using a different gateway address, that has been assigned to another router to be master for. Of course this can be extended if you have more than two routers.
The router redundancy protocols use hellos and timeouts to make sure the backup routers notice if a master fails - leaving these at the default values may mean it takes too long for things to recover to suit your requirements, so try tuning and see what effect you get. You can also decide if you want a recovered master to resume control or not, and there's a nifty option, only available in HSRP, though, that drops the priority of a router to allow a backup to replace it, if, say, it loses its link to the outside world, or another interface fails, to prevent sub-optimal routing.
But you don't have to run one of these protocols. For those of you who are devotees of open source code, meet CARP - the Common Address Redundancy Protocol. The OpenBSD folks wanted to produce code based on VRRP, but, despite the fact it's supposed to be an open standard, Cisco had patented enough of its HSRP to be able to stop any free development of VRRP, which, as we've said, is based to a large extent on it. So the OpenBSD developers started from scratch and produced a protocol that in effect does the same thing, but in a different (better, they would say) way, using multicast and allowing for encryption. See OpenBSD's side of the story. A Unix version, UCARP is also available for Linux.
Cisco's moved its HSRP along with a relatively new development, called the Global Load Balancing Protocol (GLBP). In this instance, there's still a master router that replies to all ARP requests, but it will alternate its replies so that different hosts get different MAC addresses to use for the same IP address. This means that the load's spread over all the routers taking part without you having to configure different masters for each VLAN group. This may appear as an add-on to VRRP in time, though the IETF never exactly hurries things, so in the meantime if it's something you think useful, you'll be stuck with Cisco kit. Oh, or you can go the OpenBSD path, because CARP has been designed with this feature in it from the start, which makes for a much cheaper option.