One of the most common complaints you're going to hear from your users is that the 'network is slow'. But how can you relate a subjective viewpoint to an objective measurement that you can monitor and report on? And shouldn't you be able to have a pretty good idea of what response times are likely to be for an application before it's rolled out to your users, instead of after?
There are various management packages that offer application response monitoring, but this capability tends to be part of a comprehensive, complex and expensive systems management portfolio. But before you decide if you want - and have the budget for - a full-scale commercial offering, have a look at what you can do with some home-grown monitoring processes.
What should you monitor?
Before you start though, you need to look at what it is you're monitoring. Application response time monitoring tends to fall into one of a couple broad categories, and the different types will tell you very different things about your network and the experience your users get.
Do you plan to carry out server-side or client-side monitoring? If you monitor transaction response times right at the user end, what you'll record will be closest to what the user sees. But that won't tell you where any delays are occurring, and it will necessitate the deployment of monitoring equipment out on all your remote sites, if that's where your users are. Server-side monitoring, on the other hand, can still give you transaction round trip times, and you can consolidate any probes you may need in your data centre. Monitoring traffic directly as it enters and leaves your servers also lets you find out how much of any delay is due to server processing time.
You'll also need to decide whether to carry out active or passive monitoring - in other words, do you create traffic, which hopefully fairly accurately simulates real user traffic patterns, specifically to monitor, or do you just watch for existing user data? The latter is obviously going to give you a closer representation of the user experience, but is subject to the vagaries of users. There may be no traffic to measure if a whole department's out of the office, and delays due to the user interaction are difficult to take into account. Simulated traffic is deterministic and reliable - the trick is in getting it to be a realistic approximation of your application traffic flows. There's no point reporting on the response times of a 128 byte ping, when the users are consistently having problems with an application that uses 1400 byte HTTP packets.
In terms of pre-deployment measurements, you should really get stuck in with a network analyser and do some application profiling on real user traffic - doing it as part of a pre-acceptance test means there's a reasonable chance that your test users won't wander off to the coffee machine half way through and mess up your measurements.
For ongoing network behaviour monitoring though (and this is meant to include the PCs and servers at either end, not just the piece of wire in the middle), particularly for a relatively cheap way of doing it, simulating traffic and monitoring that is more likely to give you an indication of when things start to slow down. There is an Application Response Measurement API, initially developed by IBM (Tivoli) and HP back in 1996, but that's really only applicable to application developers, who have access to the application source code and can call the API at the right places within the subtransactions of the application to allow measurements to be made and reported back to a management platform - which is how some of these commercial tools work.
Synthetic polls across your network, using packets that can be tuned in terms of size, QoS setting, port number, source and destination, can allow you to pretend you're running your users' application and monitor what sort of results they're likely to get. Remember to test for DNS queries, application calls to back-end servers and anything else that makes up the transaction, and ideally you want to monitor at various points along the path, so you can determine if a slow-down is due to an overloaded server, a WAN bottleneck, or the primary DNS server failing. Real Time Responder agents (now called the Service Assurance Agent functionality) in Cisco routers can do this for you, but you can also use the likes of MRTG (see Network Management for free?), using a simple Perl script to generate HTTP GETs, for example.
If you have a complex network and a large network and systems management set-up in place, it's probably easier to add in an application response time module that watches what's going on in different parts of your network - although be prepared for the time needed to set it up and tune it to give you the data you need. Otherwise, have a look at what you can do with some analysers and polling tools - but in either case, make sure you know what you want to look at, and what the information you'll get will really tell you about what your users are experiencing before you start.
Find your next job with techworld jobs