The list of all-too-familiar names - Nachi, Klez, Lovsan, SoBig, BugBear, Swen, Blaster and Yaha - represents only a sampling of the most prevalent worms and viruses that slithered into corporate networks this fall. But they all have one thing in common: Patches were readily available before most damage had been done. So why do these intruders continue to wreak such havoc? Because patch management is tough.
This feature is continued from part I, published on Monday December 8th.
Patching in action
Those words ring true for Williams at Centura Bank, whose organized process includes assigning a value of critical, high, medium or low to each vulnerability.
"If it is critical, each manager on our [computer security incident response] team has to respond [with their course of action] in 24 hours," Williams says. The vulnerabilities are compared against an inventory of everything on the network, including 250 servers and 1,800 desktops. The inventory is updated weekly.
Once the team managers decide a patch is needed, a five-step program Centura calls release management is followed. The first step is to develop the change process, which is then logged and audited as part of Step 2. A series of tests are done at Step 3, and if the results are inadequate the process starts all over again. If the test is successful, Step 4 includes distribution from a pilot to full-scale production deployment. And Step 5 mandates follow-up and validation that everything is complete and working.
"It's not the tools or the people, it's not having the time," Williams says of why such a regimented process is needed.
It's the same for John Engates, CTO for Rackspace Managed Hosting, which has data centers in San Antonio, Texas; Herndon, Va., and London. The company has 4,000 Windows servers, 4,000 servers running either Linux or Unix, 50 routers and 500 firewalls it maintains for customers.
"Software will never be perfect and will always require diligence and good security practices to maintain it," Engates says.
He says patching routers and firewalls is more like updating versions of software, but still there is a formal process that begins with network engineers who monitor discussion boards and security sites. "They eat and breathe this," Engates says.
After a new patch is identified, a lead engineer is notified. If the patch is for a critical flaw, notification is sent straight to the vice president of engineering who decides if the patch is needed and structures the process toward deployment, if necessary.
If the patch is for a router, the lead engineer carries out the patching plan, from calling in the right people to building automated deployment scripts.
The patch is tested in Rackspace's lab, a scaled replica of its network. "The testing length depends on how big a patch it is," Engates says. The patch is rolled out within a pre-scheduled maintenance window, and the engineering team does a postmortem, gathering documented changes and evaluating the process.
"When we feel like we are in danger of being exploited, then we will open an emergency [maintenance] window and do the patching," he says.
On the server side, Engates says the process is a bit different because customers are responsible for some patching chores. He says Linux also is a unique platform because it doesn't have as many user-friendly tools as Windows, although Microsoft's tools have their own consistency issues.
"We have no formal [Linux] configuration management tool. There is more human interaction with these servers than on the Windows side," says Engates, who notes the Windows platform sees a larger percentage of exploit code.
When Rackspace identifies a vulnerability on its Windows servers, a process similar to that for routers and firewalls is followed. Testing is done for a minimum of 48 hours to make sure there are no problems. If problems arise, the patch is put on hold and Microsoft premier support is called in.
"We pay for this service, and it is very important we maintain this relationship," Engates says.
The operating system team is ultimately responsible for giving the go ahead to install the patch, and Microsoft's SMS is used to roll it out to the live network.
"We maintain an internal knowledgebase, which documents the changes, processes and procedures so we don't make mistakes," he says. "Mistakes are bad."
Open season on clients
David Giambruno, director of strategic infrastructure and security for Pitney Bowes, says the big patching challenge now is scale.
"In the past [four months] there have been new types of attacks that go after the clients," he says. "It's not just the servers anymore, and it's increased the scale of the problem." He says Pitney Bowes has thousands of servers and clients to go along with hundreds of routers and switches. Giambruno says patching clients used to be a natural result of the client upgrade cycle. That no longer works.
"The problem is the speed and the propagation of the worms. We can't just shut off Port 135 or other networking ports because you shut off your client networking," he says. Early in the Blaster attack in August, Microsoft advised shutting off Port 135 to stop the spread of the worm. "If I turn off the port, it's a denial-of-service attack either way," he says.
Giambruno says the company's processes for automatically patching servers has been extended to clients.
In the wake of Blaster, the company deployed software from BigFix that provides a holistic view of the entire network, which stretches across 18 countries.
"If someone turns off anti-virus software on their desktop, BigFix turns it back on. If it's not installed, BigFix installs it," says Giambruno, who says automating processes is the only way to make patch management economical.
Pitney Bowes categorizes all its network assets and their relevance to the company. Client desktops are given a risk profile from 1 to 5, with 5 being the clients that must be the most secure. "Everything we report on has to be actionable," he says. For instance, desktops rated a 5 must be patched in less than 24 hours.
"Inventory is immensely critical. We built a network-detection tool, and we know everything plugged into our network. Network creep is the enemy," he says.
Pitney Bowes has a hierarchy to its patch process that includes global and regional patch delivery teams. The global team consists of representatives from the regional teams. When a vulnerability is identified, Pitney Bowes assesses the potential impact by using its data catalog to identify vulnerable systems, where they are and what they support. After the assessment, the global team or a regional team will take responsibility for the patch, depending on the systems it effects. Then the process of testing, deployment and documenting begins.
"We are getting really good at this," Giambruno says. He says the worst security incidents have taken from 1,000 to 1,500 man-hours to correct. That time is now down to 75, with a goal of ultimately reducing it to 20.
He says success comes from many fronts but includes senior management acceptance, maturation of the delivery teams and the fact that people have bought into the philosophy.
"Viruses don't care who you are. They will infect you and take down your entire network," Giambruno says. "You'll make some errors, but you have to develop some processes. Otherwise, you can't afford the manpower for [patching], no one can."