A growing trend in the disaster recovery arena for cloud providers is the use of load-balanced data centres instead of hot-cold data centres. Companies are deploying private clouds that are load balanced between their data centres to take care of disaster needs. If one data centre suffered from a disaster, the other data centre would be operating even though it is at reduced capacity.

But there are still challenges. Tracking the various configurations of the infrastructure of an application is tricky. Each application creates server names, selects open IP addresses, addresses DNS mappings, defines physical and virtual servers, creates firewall rules, defines SAN and NAS configurations, implements load balancer rules, and defines database clusters.

All of these elements exist for an application in each environment, such as development, test and production. Many of these application configurations are maintained by multiple web-based applications. The maintenance applications are not integrated and therefore the metadata application configurations are not centralised. Worse yet administrative changes are made to products at implementation time — due to urgency, like a SAN subsystem, that are not captured in the change management system. Hence the metadata is often out of date also.

It would be great to have a tool to clone the configuration in one data centre to the other data centre it is load balanced with. The configuration would need unique server names, new IP addresses. It would model the symmetry of the application in the other data centre while still providing necessary infrastructure if the other data centre fails. But creation of a tool or wizard would be difficult considering all of the valid permutations of products that could be configured.

So, centralisation of infrastructure configuration metadata is critical. Without the centralisation of the parameters and a versioning of them the deployed application and its supporting infrastructure will drift in small ways over time. Small configuration changes can cause problems in both the primary and secondary load-balanced data centre. If the configuration data is not versioned, it may be very difficult to return the data centre back to a stable state when a change leads to immediate production errors.

It also points to certification of critical elements of the architecture. Companies should have a policy that states that only tested product configurations, such as versions of virtualised machines on kernel software or operating systems, can be deployed within the data centre. Only specific versions of firewall hardware can be deployed in the various data centres. Another danger is to have a lack of options, such as single-sourced software or hardware, for various infrastructure components. If there is a common flaw in hardware or bug in software it could lead to a dramatic failure in multiple data centres.

In conclusion, corporations are addressing disaster recovery concerns by deploying applications in load-balanced architectures. But this doesn't protect against human error, particularly configuration errors. Corporations may turn to certified components like specific virtual machines or load balancers to avoid some of the disasters due to an untested configuration or a lack of versioning of configuration metadata. Configuration metadata needs to be stored in a centralised manner and versioned so that the application can fall back to a trusted configuration if errors occur.