Cisco bet big on its UCS products for data centres – and now it's going "all in" with a massive, resilient and green data centre built on that integrated blade architecture.
In fact, the company as a whole is migrating to the year-old Unified Computing System – Cisco's bold entry into the world of computing - as fast as possible. Plans call for 90% of Cisco's total IT load to be serviced by UCS within 12 to 18 months.
The strategy - what Cisco calls "drinking its own champagne" instead of the industry's more commonly used "eating your own dog food" - is most evident in the new data centre the company is just now completing in the Dallas/Fort Worth area (exact location masked for security) to complement a data centre already in the area.
Texas DC2, as Cisco calls it, is ambitious in its reliance on UCS, but it is also forward leaning in that it will use a highly virtualised and highly resilient design, act as a private cloud, and boast many green features. Oh, and it's very cool.
But first, a little background.
John Manville, vice president of the IT Network and Data Services team, says the need for the new data centre stemmed from a review of Cisco's internal infrastructure three years ago. Wondering if they were properly positioned for growth, he put together a cross-functional team to analyse where they were and where they needed to go.
The result: a 200-page document that spelled out a wide-ranging, long-term IT strategy that Manville says lays the groundwork for five to 10 years.
"It was taken up to the investment committee of Cisco's board because there was a request for a fairly substantial amount of investment in data centres to make sure we had sufficient capacity, resiliency, and could transform ourselves to make sure we could help Cisco grow and make our customers successful," Manville says. (Manville talks data centre strategy, the migration to UCS, cloud TCO and describes a new IT organisation structure in this Q&A.)
The board gave the green light and Manville's team of 450 (Cisco all told has 3,100 people in IT) is now two and a half years into bringing the vision to reality.
"Part of the strategy was to build data centres or partner with companies that have data centres, and we bundled the investment decisions into phases," Manville says.
The company had just recently retrofitted an office building in the Dallas area – what Cisco calls "Texas DC1" - to create a data centre with 28,000 square feet of raised floor in four data halls. The first phase of new investments called for complementing Texas DC1 with a sister data centre in the area that would be configured in an active/active mode – both centres shouldering the processing load for critical applications - as well as enhancements to a data centre in California and the company's primary backup facility in North Carolina.
The second investment round, which the company is in the middle of, "involves building a data centre and getting a partner site in Amsterdam so we can have an Active/Active capability there as well," Manville says.
A third round would involve investment in the Asia-Pacific region "if the business requirements and latency requirements require that we have something there," he says.
Excluding the latter, Cisco will end up with six Tier 3 data centres (meaning n+1 redundancy throughout), consisting of a metro pair in Texas, another pair in the Netherlands, and the sites in North Carolina and California. The company today has 51 data centres, but of that only seven are production centres while the rest are smaller development sites, says IT Team Leader James Cribari. So while there is some consolidation here, this overhaul is more about system consolidation using virtualisation and migration to new platforms, in this case UCS.
Cisco today has more than 16,000 server operating system instances, dedicated and virtual, production and development. Of that, 6,000 are virtual and 3,000 of those VMs are already on UCS (Cisco has about 2,500 UCS blades deployed globally). The plan is to get 80% of production operating system instances virtualised and have 90% of the total IT workload serviced by UCS within 12 to 18 months, Manville says.
While job one is about capacity and resiliency, there is a significant TCO story, Manville says.
The cost of having a physical server inside a data centre is about $3,600 per server per quarter, including operations costs, space, power, people, the SAN, and so forth, Manville says.
Adopting virtualisation drives the average TCO down 37%, he says. "We think once we implement UCS and the cloud technology we can get that down to around $1,600 on average per operating system instance per quarter. Where we are right now is somewhere in the middle because we're still moving into the new data centre and still have a lot of legacy data centres that we haven't totally retrofitted with UCS or our cloud."
But he thinks they can achieve more: "If we get a little bit more aggressive about virtualisation and squeezing applications down a more, we think we can get the TCO down to about $1,200 per operating system instance per quarter."
The current anchor site for the grand IT plan is the relatively new DC1 in the Dallas area.
The 5-megawatt facility is already outfitted with 1,400 UCS blades, 1,200 of which are in production, and 800 legacy HP blades. HP was, in fact, Cisco's primary computer supplier, although it also uses Sun equipment in development circles. The goal is get off the HP stuff as quickly as possible, Manville says. (Tit for tat, HP just announced it has eradicated Cisco WAN routers and switches from its six core data centres.)
While Cisco had initially thought it would need to keep its HP Superdomes for some time – essentially these are mini-mainframes – Manville says tests show a 32-core UCS is an adequate replacement. It also looks like Cisco can migrate off the Sun platforms as well.
Of Cisco's 1,350 production applications, 30% to 40% have been migrated to DC1 and eventually will be migrated to DC2 as well. DC2 will be the crown jewel of the new global strategy, a purpose-built data centre that will be UCS from the ground up and showcase Cisco's vision and data centre muscle. It will also work hand-in-hand with DC1 to support critical applications.
Cisco broke ground on DC2 in October 2009, a 160,000-square foot building with 27,000 square feet of "raised floor" in two data halls. Actually the data centre doesn't have raised floors because of an air-side economiser cooling design (more on that later) that pre-empts the need, but many insiders still refer to the data halls using the old lingo. Another twist: the UPS room in this 10 megawatt facility doesn't have any batteries; it uses flywheels instead.
IT Team Leader Cribari, who has built data centres for Perot Systems and others, says it normally takes 18 to 20 months to build a Tier 3 data centre, while the plan here is to turn the keys over to the implementation folks in early December and bring the centre online in March or April.
"This is very aggressive," agrees Tony Fazackarley, the Cisco IT project manager overseeing the build.
While the outside of the centre is innocuous enough – it looks like a two-story office building – more observant passers by might recognise some tell tales that hint at the valuable contents. Besides the general lack of windows, the building is surrounded by an earthen berm designed to shroud the facility, deflect explosions and help tornadoes hop the building (which is hardened to withstand winds up to 175 mph). And if they know anything about security, they might recognise the fence as a K8 system that can stop a 15,000 pound truck going 40 mph in one metre.
Another thing that stands out from outside: the gigantic power towers next door, one of the main high voltage lines spanning Texas, Fazackarley says. Those lines service a local substation that delivers a 10 megawatt underground feed to the data centre, but Cisco also has a second 10 megawatt feed coming in above ground from a separate substation. The lines are configured in an A/B split, with each line supplying 5 megawatts of power but capable of delivering the full 10 megawatts if needed.
Network connections to the facility are also redundant. There are two 1Gbps ISP circuits delivered over diversely routed, vendor-managed DWDM access rings, both of which are scheduled to be upgraded to 10Gbps. And there are two 10Gbps connections on DWDM links to the North Carolina and California data centres, with local access provided by the company's own DWDM access ring. As a backup, Cisco has two OC-48 circuits to those same remote locations, both of which are scheduled to be upgraded to 10Gbps in March.
The lobby of Texas DC2 looks ordinary, although the receptionist is behind a bulletproof glass wall and Fazackarley says the rest of the drywall is backed by steel plate.
Once inside you'll find space devoted to the usual mix of computing and networking, power and cooling, but there's innovation in each sector.
Take the UPS rooms. There are two, and each houses four immense assemblies of flywheels, generators and diesel engines, which together can generate 15 megawatts of power.
The flywheels are spun at all times by electric motors and you have to wear earplugs in the rooms because the sound is deafening, even when the diesel engines are at rest.
In the event of a power hiccup, the flywheels spinning the generators keep delivering power for 10 to 15 seconds while the diesel engines are started (each diesel has four car-like batteries for starting, but if the batteries are dead the flywheels can be used to turn over the diesels). Once spun up, clutches are used to connect the diesels to the generators.
All the generators are started at once and then dropped out sequentially until the supply matches the load required at the moment, Fazackarley says. But the transfer is fast because the whole data centre is powered by AC current and, because there are no batteries, there is no need to step the current up and down and resynch it as is required when DC battery power is used.
The facility has 96,000 gallons of diesel on premise that can power the generators for 96 hours at full load. If more is needed, there is a remote refueling station and Cisco has service-level agreements with suppliers that dictate how fast the facility has to be resupplied in the event of an emergency.
To cool the data centre Cisco uses an air-side economiser design that reduces the need for mechanical chilling by simply ducting filtered, fresh air through the centre when the outside temperature is low enough. The design saves energy and money and of course is very green.
To understand how that works you need to have a handle on the main components of the cooling system, the pre-chilling external towers, the internal chillers and the air handlers.
The first stage includes three 1,000-ton cooling towers on the roof of the facility, where water is cooled by dripping it down over a series of louvres in an open air environment and then collected and fed to the chillers in a closed loop.
That pre-cooled water is circulated through five chillers (three 1,000-ton and two 500-ton machines), reducing the amount of refrigeration required to cool water in a second closed loop that circulates from the chillers to the air handlers. (The chillers don't use CFC coolant, another green aspect of the facility.)
A series of valves activated by cranks spun by chains makes it possible to connect any tower to any chiller via any pump, a redundancy precaution. And on the green side, the chillers have variable frequency drives, meaning they can operate at lower speeds when demand is lower, reducing power consumption.
The chillers feed coils in the big boxy air handlers which pull in hot air from the data halls and route conditioned air back to the computing rooms. So far, nothing too outlandish for a large, modern data centre. But here is where the air-side economizer design comes into play, a significant piece of the green story.
When it is below 78 degrees, the chillers are turned off and louvers on the back of the air handlers are opened to let fresh air in, which gets filtered, humidified or dehumidified as needed, and passed through the data halls and out another set of vents on the far side.
Fazackarley says they estimate that, even in hot Texas, they will be able to operate in so-called free-air mode 51% of the time, while chillers will be required 47% of the time and 2% of the time they will use a mix of the two.
Savings in cooling costs are expected to be $600,000 per year, a huge win on the balance sheet and in the green column.
When online, DC2 should boast a Power Usage Effectiveness (PUE) rating of 1.25. PUE indicates how much of the power in the data centre goes to computing vs. cooling and other overhead.
How good is a PUE of 1.25? "Very good, as it requires a very high level of IT and physical infrastructure optimization in tandem," says Bruce Taylor, vice president of Uptime Institute Symposia. "But keep in mind a new data centre usually has a 'bad' utilisation effectiveness ratio because of the standard practice of building the physical facility, including the power and cooling systems, prior to its actually being needed, to allow for capacity demand growth. Leaders like Intel are able to design facilities that tightly couple the IT hardware and the electrical and mechanical systems that power and cool it."
And Taylor is a fan of the air-side economiser design: "Wherever it is feasible to use 'free' outside air in the management of thermals, that increases effectiveness and energy efficiency."
Other green aspects of the facility:
- Solar cells on the roof generate 100 kilowatts of power for the office spaces in the building.
- A heat pump provides heating/cooling for the office spaces.
- A lagoon captures gray water from lavatory wash basins and the like and is used for landscape irrigation.
- Indigenous, drought-resistant plants on the property reduce irrigation needs.
The data halls, of course, haven't yet been filled with computing gear, just the empty racks that will accept the UCS chassis. While there is no raised floor, the concrete slab has been tiled to mimic the standard raised floor layout to help the teams properly position equipment.
Air can't be circulated through the floor, but Cisco uses a standard hot/cold aisle configuration, with cold air pumped down from above and hot air sucked up out of the top of the racks through chimneys that extend part way to the high ceiling above the cold air supply. The idea, Cribari says, is keep the air stratified to avoid mixing. The rising hot air either gets sucked out in free-air mode or is directed back to the air handlers for chilling.
Power bus ducts run down each aisle and can be reconfigured as necessary to accommodate different needs. As currently designed, each rack gets a three-phase, 240-volt feed.
All told, this facility can accommodate 240 UCS clusters (120 in each hall). A cluster is a rack with five UCS chassis in it, each chassis holding eight server blades and up to 96GB of memory. That's a total of 9,600 blades, but the standard blade has two sockets, each of which can support up to eight processor cores, and each core can support multiple virtual machines, so the scale is robust. The initial install will be 10 UCS clusters, Cribari says.
Network-attached storage will be interspersed with the servers in each aisle, creating what Cribari calls virtual blocks or Vblocks. The Vblocks become a series of clouds, each with compute, network and storage.
The UCS architecture reduces cable plant needs by 40%, Cribari says. Each chassis in a cluster is connected to a top-of-rack access switch using a 10Gbps Fibre Channel over Ethernet (FCoE) twinax cable that supports storage and network traffic.
From that switch, storage traffic is sent over a 16Gbps connection to a Cisco MDS SAN switch, while network traffic is forwarded via a 40Gbps LAN connection to a Cisco Nexus 7000 switch. In the future, it will be possible to use FCoE to carry integrated storage/LAN traffic to the Nexus and just hang the storage off of that device.
The cable reduction not only saves on upfront costs – the company estimates it will save more than a million dollars on cabling in this facility alone – but it also simplifies implementation, eases maintenance and takes up less space in the cabinet. The latter increases air circulation so things run cooler and more efficiently.
That air circulation, in fact, is what enables Cisco to put up to five chassis in one rack, Cribari says. That's a total of about 13 kilowatts per rack, "but we can get away with it because the machines run cooler without all that cabling and air flow is better."
Put to use
When all is said and done and Texas DC2 comes online, it will be married to Texas DC1 in an active/active configuration - creating what Cisco calls a Metro Virtual Data Center (MVDC) - that will enable critical applications to live in both places at once for resiliency, Cribari says.
With MVDC, which will be emulated in a pair of data centers in the Netherlands as well, traffic arrives at and data is stored in two locations, Cribari says. Applications that will implement MVDC include critical customer facing programs, such as Cisco.com to safeguard order handling, and apps that are central to operations, such as the company's demand production program.
Cisco is currently trialling MVDC using applications in DC1 and a local collocation facility.
DC2 will otherwise serve as a private internal cloud, supporting what the company calls Cisco IT Elastic Infrastructure Services, or CITEIS. "It's basically targeted at the infrastructure-as-a-service layer, combining compute, storage, and networking," Manville says. "CITEIS should be able to service 80% of our x86 requirements, but we think there are still going to be some real high-end production databases we'll have to serve with dedicated environments, and maybe not even virtualised, so using UCS as a bare-metal platform."
The virtualisation technology of choice for CITEIS is VMware supporting a mix of Linux and Windows. Regarding the operating system choice, Manville says "there is no religion about that. We'll use whatever is needed, whatever works."
While Manville says cloud tech will account for half of his TCO expectations, the other half will stem from capabilities baked into UCS, many of which improve operational efficiencies.
When you plug a blade into a UCS chassis, for example, the UCS Manager residing in the top-of-rack switch delivers a service profile that configures everything from the IP address to the BIOS, the type of network and storage connections to be used, the security policies and even the bandwidth QOS levels.
"We call it a service profile instead of a server profile because we look more at what the apps that will be supported on the blade will require," says Jackie Ross, vice president of Cisco's Server Access and Virtualization Group.
Once configured, service profiles can be applied to any blade, and storage and network connections can be changed as needed without having to physically touch the machine; any blade can access Ethernet, Fibre Channel, FCoE, etc., Ross says.
That speeds provisioning, aiding agility, Manville says. The goal is to get to 15-minute self-service provisioning. "We have this running but haven't turned it over to the application developers for various charge-back and other authorization issues. But our sys admins are seeing significant productivity gains by being able to provision virtual machines in an automated fashion."
Taken all together, the broad new IT strategy – including the build-out of Texas DC2 and the shift to a highly virtualised cloud environment driven by the company's new computing tools – is quite ambitious and, if they pull it all off, will be quite an accomplishment.
Cisco is definitely taking the long view. There is enough real estate at the DC2 complex, and the core infrastructure has been designed to accommodate, the doubling of the "raised floor" space in coming years.
Read more about lans and routers in Network World's LANs & Routers section.