Sometimes, when end users experience application trouble, response times degrade slowly. Other times, applications seem to turn off as if a power switch has been flipped. In many cases, the causes of end user application woes can be identified and preempted before there is perceivable degradation; other times, they cannot. But in all cases of poor end user application experience, the cause needs to be identified and remedied swiftly. That is easier said than implemented.
The ability to deliver a consistent, quality end user experience and to fix problems is both growing more important and more complex than ever. More important because workers today are more sophisticated, expect a superior experience and are less patient, while dependence on IT never has been higher. The challenge is more complex because the move to virtualisation, cloud computing and SOA makes for more intricate application and infrastructure dependencies.
That in turn makes troubleshooting much trickier. In fact, business applications are growing more complex by the day. Seemingly simple transactions encompass the end user client, web server and application server, databases, mainframes, message busses and increasingly services from web service providers, such as public and private clouds.
When managing application performance, from the eyes of the user, IT teams need light shed onto the actual trouble spots. That's the only way SLAs are met, application degradation calls to the help desk are minimised and web sites are not abandoned due to poor response times. All this requires a solid understanding of how the health and capacity of every system and device, across all tiers of the infrastructure, supports the application, from the network through the servers and databases to the end user itself. Unfortunately, the way this information is typically gathered is no longer effective, or efficient, for today's rapidly changing and dynamic environments.
Most organisations have been relying on point products instead of gaining the view of the entire application transaction lifecycle. They are monitoring the end user experience, but in isolation. The same is true for their database performance, and the latency of the network layer, as well as how servers, mainframes and other systems are performing. These tools provide slivers of insight, like a flashlight in a dark field, when what they need is the shine of a spotlight. The result is siloed information; the data necessary to fix the problem is scattered among a variety of monitoring and troubleshooting tools.
In order to determine the causes of problems, members of each respective IT group need to assemble together with their respective reports. They need to compare their numbers and try to figure out the cause of the problem. It is manual correlation and it should have gone extinct years ago. While these specialised tools are necessary, and often provide deep insight into the areas on which they focus, they don't help manage the entire business-technology infrastructure, in real time. When problems can start to reveal themselves with shifts in performance measured in milliseconds or fractions of a percentage of CPU utilisation, and then impact the end user minutes or hours later, it becomes clear that organisations need the ability to detect end user issues before they occur.
That means not only merely monitoring the application, but also tracking the entire transaction flow and monitoring each step of every transaction by measuring response times, latencies, protocol and application errors and all of the associated dependencies, on every tier from the end user through the data centre.
Consider the behind-the-scenes complexity of a typical online purchase. The buyer adds items to the shopping cart, entering his or her billing information and clicking “submit.” If the user gets an error, the business is most likely lost. This transaction will have likely touched dozens of systems: the underlying infrastructure, applications, databases, a credit card authentication system and other tiers. Had the capabilities been in place to monitor all of those “pieces” of the transaction, their error may very well have been avoided altogether. For instance, a sluggish database, or partner application, could have been spotted - and remedied - before ever impacting the online shopper. And, for those errors that cannot be spotted in advance of an error or failure, they can be fixed much more swiftly.
Consider this capability as it applies to shipping physical packages. In the dark ages of shipping (barely a decade ago), packages were shipped and the shipper and receiver knew little more than when the package left its starting point and arrived at its destination. Today, packages are tagged and customers can track their progress in near real time as they progress along each waypoint in their journey. Still, there is no easy way to determine, while the package is en route or before it is shipped, if it will miss its deadline. It would be useful to have more detail. To be able to predict if the package will not arrive on time as a result of any difficulties.
That data can be culled from slowdowns at the loading dock, the health of the truck engine and tires and real time traffic information for the truck's route. Similarly, end user experience monitoring today provides that type of visibility into application response time and then alerts when response times have degraded. Unfortunately, many tools typically lack the more detailed information necessary to predict and rectify potential performance issues before they arise.
To manage the end user's application experience properly, organisations need the same capabilities today when it comes to tracking application performance. They need to tag the transaction from its starting point and be able to monitor it as it traverses its way from the end user's system all the way through the data centre and back again. That kind of capability won't be found with conventional point solutions that individually measure the performance of networks, databases, servers or applications.
It only will be found by monitoring application performance from the end user's perspective, as well as understanding how that experience is affected by the real time health of all of the devices and systems on which that application depends. That means, ultimately, that application failures, slow response times, and unmet SLAs must not be discovered at the help desk or from upset customers when the damage is already done, and problems are the most difficult to fix.