I am having a bit of a nightmare with a Service Provider, which, for its own protection, shall remain nameless, over a so-called QoS-enabled MPLS WAN.
A fairly sizeable UK public sector organisation has been having problems with an IPT...
I am having a bit of a nightmare with a Service Provider, which, for its own protection, shall remain nameless, over a so-called QoS-enabled MPLS WAN. A fairly sizeable UK public sector organisation has been having problems with an IPT deployment, and asked me in to help. Apart from the obvious issues they were seeing, such as phones and remote gateways reregistering for no apparent reason, the IPT maintainer was reporting that QoS DSCP settings from handsets and gateways at some remote sites were been seen back at the core site reset to zero, and that heartbeats between gateways and the call server were being lost. It looked as if the WAN—which had apparently been provisioned with four QoS queues, and was honouring all QoS markings and sending voice traffic into the priority Premium queue—was to blame, but apparently the Service Provider (which owns and manages the CEs for the customer as well as running the MPLS core) had already been asked to check its side of things and reported that everything was configured correctly. So out came the laptop and analyser. When we started looking, it was obvious that the QoS settings were fine as they left the LAN on one site, and not fine when the traffic arrived at the LAN at the other end. The TOS byte was being reset in the WAN. Now that we had the proof, the provider had a closer look and found that, yes, some of their devices did seem to be missing the commands to trust the DSCP settings they received. So that was changed, and things looked a bit better. But we still seemed to be dropping voice packets from some sites. So they looked again, and found a QoS policy hadn’t been configured for the correct amount of priority traffic on one of their PEs. So that was changed. And, yes, things looked a bit better. But there were still issues. Some of the customer’s sites act as hub sites for smaller offices that are close by, so the provider had installed managed aggregation switches at these sites, which therefore sat in the path between the individual CEs and the PE. And, it then turned out, weren’t configured for QoS. So much for an end-to-end QoS infrastructure. That was changed, and things continued to improve. It’s been a long process getting the provider to let me see the CE configs. Their view is that it’s their responsibility and we, as the customer, don’t need that level of information. Fair point, if the network actually worked as we expected, but since that was pretty far from the case, I managed finally to get sight of at least the QoS-related parts of several CE routers. To find that some of the CEs were running QoS policies that were totally different from what we had been told, which had been changed from the original design without the customer being made aware of any changes, and which were now actively remarking our DSCP settings to suit their own internal classification structure. And I still don’t think they understand why I’m not best pleased. The organisation I’m doing this work for has been charged for over two years now for a QoS-enabled network that they obviously have not had, and which has caused them major headaches. Yet whenever they have asked the provider to check things over in the past they have been told everything has been fine. Only when presented with proof—and with someone who has dug into every technical detail—have they investigated and admitted to the problems with their configurations. Companies deploy managed WAN services because they don’t have the time or expertise to manage it themselves. Telcos should not exploit this by providing poor services and ignoring their customers’ problems.