After Dammions Darden arrived as the new senior systems administrator for the city of Garland, Texas, he knew that the 50 to 60 physical hosts for this 234,000-person city outside of Dallas were not running nearly as efficiently as they could be. Some had excess capacity, others were running way too hot.
Traditionally if apps are slow and virtual machines need more memory the easy answer is an unfortunate one: Throw more hardware at the problem. But Darden wasn't satisfied with that. While roaming the expo floor at VMworld two years ago he stumbled across VMTurbo, a company that specialises in analysing virtual environments.
Using VMTurbo to gain insight into what was happening in the virtual realm, the city of Garland found it could ratchet up the VM load on some machines dramatically, going from 20 to 25 VMs per host to 40 to 45 on some servers. That consolidation freed up hosts that could be used to support other initiatives. The city, for example, was considering a virtual desktop environment but was worried about hardware costs, and suddenly Darden had servers to host the deployment.
It's a lot easier to just keep adding hardware, but there's a better way, Darden says.
Matt Eastwood, general manager of IDC's enterprise platform group agrees. He estimates the typical enterprise server runs 10 to 12 virtual machines today at about 30% to 40% capacity. An optimal server utilisation rate is usually around 60% to 70%, meaning many servers could easily handle twice the VM load. With an explosion of VMs on the horizon -- IDC predicts the number of VMs to increase 130% in the next four years -- some shops will buy more hardware to increase capacity. But experts say smart organisations will first optimise their existing environments.
It's a people problem
How do organisations end up with less than optimal systems in the first place? "The things that cause inefficiency in servers, systems administration and cloud management generally have to do with manual, disconnected and fragmented processes more than hardware," says Mary Johnston Turner, an IDC analyst who specialises in management software. "The real way to improve IT operations is to adopt a more integrated, standardised and automated management processes that covers the life cycle of services offered."
Doing so is not easy. Johnson says going from an environment where resources are requested and delivered ad-hoc to having a fully automated and self-service system where users can request and consume what they need is a transformational shift. It can take a lot of time and effort to set it up, but the payoff will come with a more well-oiled operation.
Automating a server lifecycle environment could save 10% to 15% in both hard and soft costs meaning actual dollars saved and time freed up, she says. Increasing server utilisation is great, but if it takes weeks for a business unit to get access to a VM it has requested, then it doesn't matter how efficiently the server is running.
Improving IT operations be it through more automated management of services, or using software tools to get more bang for your hardware buck -- is a goal every virtual machine administrator has. The problem is finding time to do it. "People know what they need to do," says Brian Kirsch, an IT architectural instructor and board member of the VMware User Group. "But the priority is keeping the lights on. The top priority today is to keep everything up and running."
You can't optimise what you can't see
Johnston says using management tools to build a private cloud or configuration tools like Chef, Puppet and Ansible to automate VM configurations help free up time to focus on gaining efficiencies. But another key is getting a good view of exactly what's going on inside. Bernd Harzog, capacity management analyst at consultancy The Virtualisation Practice, says "the single biggest reason [for inefficiency] is the lack of information to comfortably make more aggressive decisions." Virtualisation managers generally don't have enough information and visibility into their environments and therefore are afraid of overprovisioning servers and degrading performance.
A whole new segment of vendors have sprouted up to help with this issue. VMTurbo, which Darden used in Garland, is one option. Darden installed the software and within hours had recommended improvements. Two years after the initial install, Darden still uses it daily to monitor his operations, run reports and automate fixes.
Cirba is another company focused on the issue, but takes a somewhat different approach, using an efficiency index to assess workloads and show where improvements can be made. CTO and co-founder Andrew Hillier says the perfect index rating in their system is 1.0 and it is common to find environments running at .5 to .7.
Why? "The way workloads fit together typically looks like a badly played game of Tetris," Hillier says. "Nothing in VMware or other virtualisation tools looks at how workloads work together or tries to figure out how to optimally balance them out."
He notes, however, that the optimum utilisation rates will vary by workload. For some workloads you want to be conservative, so a 1.0 rating will mean far less utilisation than you would push for in less demanding situations.
Another optimisation problem is VM sprawl, where there are more VMs provisioned than required. Capacity management tools can help solve that issue too. Harzog of The Virtualisation Practice says one key to look for when evaluating tools is to ensure they can be configured to automatically make changes in the environment, instead of just alerting you about the changes that should be made. Both VMTurbo and Cirba do that.
While many server utilization tools look primarily at compute resources, John Blumenthal, co-founder of startup CloudPhysics, says it's important to look at the entire IT environment. His company's product is a cloud-based SaaS service that analyses usage across everything from the CPU to memory, networking and storage.
He says the biggest cause of inefficiency is what he calls death by a thousand cuts. What impact will adding another server have on the environment? Why are system response times slowing? How will this change impact the broader environment? "The nature of the problem is insidious," he says. "It's not one giant boogey-man staring you in the face. It's not being able to see the consequences of an action, and not being able to figure out what the right course of action is to take." CloudPhysics says many customers see up to 3.5 TBs of storage freed up when initially deploying its tool.
Reaching peak efficiency may mean spilling workloads over to public cloud resources, an alternative being examined by companies such as Autotrader.com.
To the cloud
Some of Autotrader's 200 ESX hosts in its development zone are running as many as 140 VMs; production hosts run much fewer. But like many organisations, Autotrader.com is exploring how it can use public cloud resources to supplement what it hosts internally.
Chris Nakagaki, senior systems engineer in the cloud infrastructure team likes the idea of being able to migrate workloads into VMware's vCloud Air public cloud with minimal changes, and the ability to federate across multiple VMware public cloud partner vendors if needed. But moving to the public cloud has its own set of efficiency challenges too; the public cloud can be a complicated place.
Vendors like Amazon Web Services, Microsoft Azure and Google Cloud Platform have dozens of types of virtual machines to choose from, and resources change on the fly and are paid for by the minute or hour. You can save significantly, for example, if you spin down resources when they're no longer required.
Gartner analyst Lydia Leong says once a certain threshold of usage is reached in the public cloud, it's worth exploring tools to manage cloud usage and optimise spending, noting that that threshold will be different depending on company size. Cloudyn and Cloud Cruiser each have tools that can help organisations determine when to use on-demand versus reserved instance pricing in Amazon Web Service's cloud and what the right VM instance size is for a workload. A tool like Cloud Cruiser will monitor a hybrid environment, recommending when to run a workload in a private cloud compared to using a public cloud. Cloudyn says that it can help AWS customers who spend $10,000 recoup up to one-third of their public cloud spend by optimising their usage.
Those tools are similar to the on premise ones from Cirba and VMTurbo. Harzog estimates that any customer who is managing more than 50 VMs can likely find efficiencies of between one-third one-half by using those.