Many of today’s cloud services are one-size-fits-all, offering little or no customisation. In fact, this standardisation is part of the secret sauce: offer a few services and do them really well. But “standard” does not mean that customers need to settle for “commodity.” Instead, as the cloud evolves, consider a model where providers accommodate a wider-range of configurability, allowing customers to choose among multiple flavours of the same service—either through differing levels of quality of service (QoS), or types of resource allocation. This model accommodates the customer’s needs and at the same time helps the provider drive efficiency.

Take a web server example: even with the same application, different workloads require different resources. While typically memory constrained, CPU is key in workloads requiring SSL encryption or the network may be the barrier when serving large downloads. Whether offered as infrastructure-, platform-, or software-as-a-service, the customer’s QoS improves if it lets the provider know which resources are key. This allows a provider that offers a “vanilla” web-server service to tune that service in terms of resource allocation in a way that better satisfies each customer’s specific needs. The customer gets what they want; the provider delivers what they need.

How do we implement such a model? Cloud’s provider-customer model means that both parties lack information needed for the other to make the best decision. Customers do not have access to the resource status on the provider side. The customer cannot control where to deploy their workloads even though they have better knowledge of the impact resource allocation has on their application’s QoS. On the other side, the provider’s deployment and scheduling strategy would be more efficient with knowledge of what resources are important for the customer’s workloads, and which workloads are more important than others. 

Consider a scenario where a customer submits a workload that is sometimes disk-intensive. Without knowing this behaviour is a priority, the provider deploys the workload with another that is also heavily disk loaded. The result is performance degradation for both workloads during periods of disk contention; both customers’ workloads suffer. Today’s one-size-fits-all solution either accepts performance degradation as a consequence of cloud, or requires over-provisioning which leads to inefficiency in the provider. 

However, with added insight, the provider can avoid co-locating similar workloads that peak in terms of resource usage at the same time. Understanding what resources drive QoS allows the provider to act, controlling resources in workloads that are less-sensitive in order to guarantee access for others; with conflicting resource requirements, when possible; or scheduling and setting affinity rules to avoid co-locating contending workloads based on the learned characteristics of past behaviour. With awareness of workloads relative importance, the provider can protect critical workloads while degrading those less critical first. The result is a more efficient assignment of workloads to provider resources, and a higher level of customer QoS. 

Implementing instrumentation to allow the provider and customer to signal workload types and importance would benefit both parties. In practice, the customer signals 1) the key resources and 2) relative importance to driving QoS.  

Consider an approach strictly from the provider’s perspective. The provider profiles the resource usage of customer workloads (e.g., CPU, memory, or disk-intensive) and uses that profile to limit performance degradation due to resource contention. Understandably this profiling is easier for some workloads like batch, and more unpredictable for those like web applications. However, even knowing how unpredictable a workload is helps the provider tune the over provisioning required to account for variability, or enables the provider to isolate those unpredictable workloads, limiting their impact on others. With more information, the provider limits under- or over-provisioning resources while improving the customer’s experience.    

In a more direct approach, the customer explicitly reveals key insights about the application’s QoS drivers.  Some examples:

  • A batch job needs to finish by Friday at 8PM. Whether it finishes one day or one minute prior delivers the same utility for the customer. The customer reveals the progress of the job so that the provider can track the impact of resources on the time to complete the job; no more, no less.

  • In high frequency trading, suppose more resources lead to faster or more accurate results.  The customer reveals the fraction of computation processed at any time so the provider can add key resources when they are available.

  • A web application whose QoS shows a big improvement when resources go from zero to 10 percent, but then diminish after a certain amount. The customer reveals the number of sessions and service response time so the provider can infer the impact of resources on turn-around-time.

Signalling gives insight into the workload sensitivity in getting more or less of a particular resource. Using this insight (in the examples above it’s the time of completion, the progress of the computation, and the number of sessions and their service delays) the provider can better profile and prioritise resource provisioning.

Whether explicitly signalled by the customer or inferred by the provider, insight into the customer’s demand allows the provider to better fulfil the service requirement. The result is better performance for customer workloads and more efficient provisioning for providers. It’s a win-win situation.

Posted by Teresa Tung, Manager at Accenture Technology Labs and Qian Zhu, Research Scientist at Accenture Technology Labs
Enhanced by Zemanta