Last month we reported that Stratus Technologies, the company known for its fault-tolerant servers, was to allow the operating system to be upgraded while the server remained in operation. The technology is included in the company's ftServer W Series 4000 Windows-based server line.
The background is that hardware and software is generally pretty reliable so the biggest cause of server downtime is not equipment failure but planned downtime, for instance software upgrades and critical operating system hot fixes.
So how can the system be updated while remaining in operation? In normal operation, the two servers are connected by a redundant backplane and lock-stepped. This means that all instructions are duplicated across both servers. If a fault is detected, the system can detect which component has generated the error and take it offline, which generates an alert.
Stratus' Active Upgrade technology works by splitting the system into two independently running servers - let's call them side A and side B. when an active upgrade is initiated, the two servers are separated -- a bit like separating conjoined twins.
While side A continues as normal, the assumption is that the system partition of the shared storage will remain unchanged, while data on which side A is working will continue changing. Side B on the other hand is now ready to be upgraded.
After the upgrade is done in the normal way, the two systems need to be merged or re-synchronised. At this point, side A has live data but an old system while side B's system is updated but its data is stale. Before the re-synchronisation, applications may have to quit and restart in order to ensure that both sides are working with fresh data. According to Stratus, this should be almost instantaneous.
First the system of side A is updated then the data is duplexed, using deltas only, using the company's rapid disk resync technology which ensures that, if a disk or server unit is removed for a brief time, only the changed blocks are re-mirrored. Both processes are reversible, according to Stratus.
Finally, the processes re-establish lockstep, the system storage re-engages mirroring, and the applications continues to run in the meantime.
The company said that the heavy lifting is done by its own chipset, which manages data flows across the memory and I/O buses - but Stratus wasn't keen to talk about the patented process in any more detail.