There are a lot of terms floating around to describe how to set up metrics for evaluating service performance over the network. The best established is Quality of Service, or QoS, which has generally taken on a fairly technical, bandwidth-centric definition where it remains valuable as a metric, but is far from summing up what really counts in the eyes of the end user.

There are other terms like RUM or "real user monitoring" that are technical, but which do at least focus on a series of monitoring technologies truly targeted at the "real user" or "end user."

And then there's QoE, or Quality of Experience, which is my personal favourite because it is not centred in technology, but in the flesh-and-blood experience of the user consuming your services.

This focus is a lot like the original Mean Opinion Score (MOS), as it applied to telecommunications services. Like it or not, how your customers "feel" about your services is going to be how they're going to vote with their budget approvals.

This isn't to say that technical metrics don't count. You absolutely need them. Building towards QoE with a good combination of technical metrics and a healthy dose of customer dialogue is a fine art.

In this column we're going to look briefly at some of the metrics and technologies that most often apply. But before we do, I suppose I should answer the obvious question for many network managers: "Why me? Why should I care about QoE? Isn't that the job of the applications manager or the helpdesk?"

The answer is that you're partly right. QoE isn't your job alone. But the network is the delivery system for almost all application services, like it or not - including VoIP and unified communications, but most predominantly focusing on web-based applications. Many of these depend heavily on network-centric monitoring tools to ensure their performance.

Data from EMA shows that web-based applications for internal use dominate in terms of what's actually being deployed over the network. They are followed by client-server applications, then web-based applications for external use and web services, and all those are well ahead of VoIP.

EMA research also reinforces the importance of the network in the delivery of application and business services - a healthy 72 percent of respondents from a wide range of enterprises and service providers had more than 20 remote branch offices. In a parallel EMA survey, 34.1 percent had more than 100 remote locations. Networked applications are enabling new business models across verticals - that's true today with Web 2.0, and even truer tomorrow with the advent of globally dispersed service-oriented architectures.

So, QoE is important even if it isn't your job alone. Moreover, if you're interested in optimising the network, what better value to use than the value that truly counts from the customer/consumer perspective? And that's QoE.

QoE is also the natural place to trigger problem isolation and root-cause diagnostics because it's about meaningful service parameters, not component-centric metrics that are primarily useful for diagnostics.

Some QoE metrics

The first thing to keep in mind is that QoE metrics are not designed in themselves to be diagnostic metrics. For instance, while configuration information can be hugely valuable in isolating a root cause and remedying a problem, it doesn't do much to inform on QoE. The same can be said about flow-based traffic volumes, or packet analysis and network forensics.

EMA research indicates that most often QoE is focused on metrics such as availability, Mean Time to Repair (MTTR), and Mean Time Between Failures (MTBF). But the first thing to say about QoE is that any number of studies have shown that end users care more about degraded response time than intermittent availability issues. This has more to do with human psychology than network engineering. End users typically believe that a complete failure in availability will soon be remedied, whereas they remain sceptical that effective action will be taken if their response time is degraded. Moreover, degraded response time tends to persist far longer than most availability issues, so their perception is reality in this case.

Yet response time can be troublesome in other ways. Average response time over a day or a week or a month may not be very meaningful in itself. Inconsistent response times, even with faster overall averages, can be far more troublesome to working rhythms than slower but more consistent service delivery.

And those terrible spikes that alienate users can occur within a single minute or even within seconds - spikes that may not only help to catch alienated users but also help to provide insight on where the problems lie.

There are other metrics that have various degrees of relevance, too. These include flexibility and choice of service - something that network planning plays a role in. Data security is another core value that people may not think of in QoE, but for certain applications, and certain information, it can be a prime customer concern. Cost effectiveness and visibility into usage and cost justification is increasingly of interest to business clients. Mobility is another QoE attribute, more important for some applications than others, and the list goes on.

Technologies

Probably the biggest debate regarding QoE in terms of response time is between the value/role of synthetic versus observed transactions. The truth is that both are valuable. Synthetic tests are proactive, can give you more consistent data suitable for SLA requirements, and can let you know if availability is lost, which observed transactions typically cannot.

Many synthetic tests also offer diagnostic value, especially when the scripts are optimised to look at certain types of transactional behaviours that occur at on an ad hoc basis in the real world. On the other hand, synthetic tests occur at specified intervals and therefore may fail to capture any number of real problems that occur in finite timeframes.

Moreover, many observed capabilities have become increasingly rich in function and are beginning to offer much of the granularity of insight once available only in synthetic tests. So, the truth is that both synthetic and observed should be in place - if you really care about QoE.

Placement is also important. Data-centric transactional monitoring can provide back-office detail that is quite useful in diagnostics, but it can also provide rich insights into issues surrounding QoE - in some cases playing back actual transactions in cinematic manner.

But capturing data at the end station is really at the heart of QoE, through either synthetic and/or observed transaction capabilities. Many of the more network-centric solutions for QoE benchmarking sit at the edge of the data centre and calculate end-user experience, in some cases in conjunction with insights into the back-office transaction as well.

Most of these are not "heavy hitting" in the true QoE sense. Still, their insights can allow you to diagnosing the cause of the problem far more quickly, as well as helping you anticipate performance degradations in remote locations.

Dennis Drogseth is research VP at analyst firm Enterprise Management Associates (EMA).