Your software developers are telling you that your WAN link isn't giving them the throughput for their application that they expect. There must be a network problem somewhere.

Before rushing off to call your service provider and complain, let's just have a quick look at what the software guys are seeing and think about how this ties into network bits and bytes.

Do the sums
First off, just what are your developers/software gurus/users actually measuring? Chances are they have some program that's telling them how much application data is passing between boxes in a set time. They'll be looking at that, comparing it to what you've told them the bandwidth is, and seeing a discrepancy.

Don't panic. That's to be expected. Remember that this data has to be encapsulated in various network protocol data units. If this is a TCP/IP stream, running over a serial link, using PPP, for example, every packet will have a 20-byte TCP header, a 20-byte IP header and 6 bytes of PPP encapsulation. If the application is using small packet sizes, this overhead can be a substantial chunk of the data throughput. Your router interface may show near 100 percent utilisation, but a third of that could be network protocol overhead. Obviously up at the application layer, they'll only see the two-thirds of user data getting through.

The numbers and percentages will change depending on media and configuration, but you get the idea. If you have a LAN connection between two sites - all the rage these days with LAN emulation services and metro Ethernet offerings - a 64-byte Ethernet frame could have 58 bytes of overhead (Ethernet + IP + TCP, less if it's UDP instead of TCP, more if it's an 802.3 frame with SNAP headers). Anyone monitoring the 'real' user data throughput will obviously see numbers considerably different from what you're seeing on the network devices. Remember, big packets are more efficient. Be especially aware of ATM links, since the fixed cell format can result in an awful lot of padded cells carrying very little useful data.

Waiting for responses
Is your WAN utilisation running at 100 percent all the time? Or are there great gaps where we're waiting for something in the chain to actually do something? In a client/server relationship, as well as the network-related delays, you need to be aware of the effect of the client and the server.

A server will take a finite time to reply to a request. Processing, disk access, memory usage and I/O capabilities will all affect how long it takes, and the response time will have a direct bearing on the throughput recorded. You should be able to monitor requests going into a server and replies coming out, independently of any network latency, to determine if any less-than-optimal behaviour is due to end-station delays.

Similarly TCP window sizes - how much data can be sent without receiving an acknowledgement - will affect raw throughput figures. Any form of acknowledgement will limit to some extent how fast you can throw data at the far end. The frequency of those acknowledgements, and the amount of data that can be sent between them, could mean that theoretical maximum data rates can't be met.

Other stuff on the network
In an ideal world, testing should be done with no contesting traffic, in a lab environment. If there's other user data, it stands to reason that the application under scrutiny will be sharing resources (Quality of Service configurations can come into play here, but that's a topic for another day). Even when there should be no other traffic about, though, remember to take account of background network management noise: routing updates, SNMP traffic, collisions on a shared LAN (or one with mismatched duplex settings). They may seem minimal, but, especially on slow-speed lines, may be using up more than you expect.

Summary
So if you get reports of network throughput being less than expected, first make sure you know what it's actually realistic to expect. Check out the following:

  • What's being measured: bps on the wire does not equate to user data throughput.
  • Measurement time-period: averaging can cause erroneous figures, especially if wait times aren't considered.
  • Response times: if a device takes time to process a request, or is waiting for an acknowledgement from the other end, this will affect data throughput.
  • Background traffic: other traffic will use its share of network bandwidth, and potentially leave less for the application under test.
  • Keep an open mind: it could actually be a network problem.