Probably the most important place to deploy QoS is over your WAN links. After all, they cost you the most, are the most heavily utilised and are the most likely to fill up, or go down. Its also the place where you see the most drastic speed mismatches, as gigabit intra-LAN connections are connected together via pipes that in most cases you measure in kbit/s.
This is where its very important to prioritise your real-time traffic so that you can minimise latency, as your packets wait to get transmitted out over the WAN. For a lot of your data, a few hundreds of milliseconds delay is no big deal. For highly interactive traffic - and particularly voice or real-time video - it can make a big difference between a usable service and one thats unacceptable. Variation in delay (jitter) is also something that must be avoided, to provide a consistent service level.
So it is essential to implement a strict priority queuing mechanism on your router interfaces facing the WAN, both at central sites and in your branch or remote offices. Often termed Low Latency Queuing (LLQ), this gives you one priority queue that will be serviced over all the others: as long as there are packets in that queue, they will be transmitted onto the wire first.
For this reason, there are design guidelines as to how much of your overall traffic should be classified so that it is put into that queue. This stands at about the 30 per cent limit, i.e. your high priority traffic should make up no more than a third of your total traffic on that link. Any more than that and you risk starving the other traffic of its turn to be serviced by the scheduler beyond what can be coped with by the applications. You also dont want to have too many packets queued up in the strict priority queue or again youll get delays that cant be tolerated by the end stations.
Even with this prioritisation, if you have slow speed WAN links, you might find that your high priority packets get hit with unacceptable delays. Say, for instance, that your scheduler checks the strict priority queue and there is nothing there. It can then move on to the next data queue and start to dequeue its packets out of the router interface onto the media.
A millisecond after it has started to transmit that packet, a high priority packet arrives in the strict queue. It will have to wait until the packet being dealt with has been transmitted - the router cant stop half-way through. This could have a bigger impact than you might initially think.
For example, a 1500-byte file transfer packet, being sent over a 64kbps leased line will take approximately 190mS to be serialised out over the physical media. Which means that potentially a packet in the strict priority queue will have to wait that long just to get out onto the wire, never mind any other delay due to processing, jitter buffers etc. If that was a voice packet, the speech would be noticeably jerky (150mS end-to-end delay being the recommended design guidelines by the ITU-T).
So you need to implement something called LFI - Link Fragmentation and Interleave - that in effect makes sure that large data packets are broken down by the router into sections, each of which will take a specified small amount of time to transmit, usually 10mS. At the other end, the receiving router will reassemble the fragments. In this way the fragments of data packets can be interleaved with the higher priority packets, so that the latter never need to wait in their queue for too long.
If your WAN links are 1Mbit/s or over, you dont need to worry about this configuration, as even the largest packets can be transmitted in less than 10mS anyway, but for those of us less able to justify high bandwidth links to our smaller offices, this is something you should be configuring on your WAN routers.
Because your WAN bandwidth is precious, you may want to run a level of optimisation. This tends to relate to voice over IP, since it has predominantly small payload packets, although that could include video too, and it can also apply to a lesser extent to smaller data packets.
When you take a voice packet and encapsulate it in IP, you are adding three headers for UDP, IP and RTP (real time protocol). Typically one voice packet will include 20 bytes of voice bearer traffic, and 40 bytes of these headers, which doesnt make for a very efficient use of your network.
The 40 bytes of header information can be compressed into between 2 and 4 four bytes using cRTP (compressed RTP), since there is actually very little that changes within these headers for each flow. This obviously increases the efficiency of the data transfer and decreases the bandwidth required for a G.729a call (8k voice) from 24Kbit/s to around 12-14Kbit/s. Even on a larger (e.g. 256byte) non-voice packet, you can save in excess of 10 per cent of the bandwidth.
Two things to note here: cRTP is pretty CPU-intensive, so while its fine for a handful of flows, if you have tens of calls over a relatively high-speed WAN, it isnt a good option, as chances are your router wont cope. Also, cRTP is done on a hop-by-hop basis, so may not be suitable to all WAN environments.
In summary, to make the best use of your WAN bandwidth, you will need routers that can support a strict prioritisation scheme for your interactive traffic. For WAN connections of less than about 1Mbit/s, you must also be able to configure LFI, and potentially cRTP, for link efficiency.