In the light of the recent events at a BT network centre in Paddington (London, UK), where a series of compound failures caused a massive outage with huge knock-on effects, I’m sure many businesses are taking another look at their own (and their suppliers) availability with a view to beefing up business continuity.
Within the spirit of continuous improvement this should be taken as an opportunity to improve the overall ’system’ rather than finger pointing.
What is business continuity?
Quite simply, business continuity is how you can stay in business (and meet your customers demands) in the wake of a disaster (whether a localised incident such as flooding or a further reaching issue like a terrorist attack). Your plan will typically cover all the business critical functions, systems and data.
So what is high availability?
High availability (or HA) is the way that system designers ensure ‘operational continuity’ of a system. This will typically involve ensuring that the system has no single point of failure (and for really fault tolerant systems there should also be no single point of recovery).
One of the common mistakes that people often make is getting confused between availability and scalability. Scaling is usually described in terms of vertical (bigger boxes) or horizontal (lots of boxes) scaling - horizontally scaled systems often have a degree of high availability whereas vertical scaling is potentially a bigger risk as you lose more capability when you lose a bigger box.
How do redundancy and diversity fit into the picture?
The phrase ‘belt & braces’ describes this nicely - two different ways of achieving the same goal.
One example might be having multiple (diverse) suppliers provide network connections to a building; in data centres it is common to have a (redundant) backup generator - though this is usually coupled with a battery-based UPS (uninterruptible power supply) to provide seamless failover.
But we’ve got this new fangled Cloud thing
Let’s attempt to clarify what ‘cloud computing’ means - there are a number of different types of on-demand services that run ‘in the Cloud’:
- IaaS - Infrastructure as a Service
- PaaS - Platform as a Service
- SaaS - Software as a Service
- IaaS - Infrastructure as a Service
This category splits into:
- Virtual Private Server (VPS) providers such as Slicehost allow businesses to rent a fixed size virtual server on demand for a set monthly fee
- Elastic computing providers such as Amazon EC2 allow businesses to have a virtual machine image that can grow to meet requested demand (vertical scaling).
PaaS - Platform as a Service
Platform as a Service providers offer application-hosting platforms (again these can be of fixed-size or scalable) so that application developers can focus on adding business value rather than needing to worry about the underlying infrastructure. Google AppEngine is one example of this approach for Java & Python applications.
SaaS - Software-as-a-Service
Software-as-a-Service provides access to business software over the Internet for a set monthly fee. This can either be multi-tenanted, where multiple companies share a system, or with separate installations per customer. One of the major SaaS success stories is SalesForce.com who started with a CRM on-demand offering.
Public vs. private vs. hybrid clouds
We’ve already discussed public cloud offerings above; a private cloud is typically an on-premise or a dedicated outsourced managed cloud platform (e.g. using the open source Eucalyptus or VMWare vCloud). A hybrid cloud is where a private cloud is used in conjunction with one or more external cloud providers.
There is also the notion of community clouds with varying definitions:
- whereby similar organisations pool resources into a shared multi-tenant cloud (though I prefer to describe this as a shared private cloud or a restricted cloud e.g. Google’s ‘GovCloud’)
- a decentralised peer-to-peer cloud utilising spare computing power (and bandwidth) of internet-connected computers.
Assuming that your organisation can operate its business critical systems and store business critical data in the Cloud, then there are a number of possible deployment models to consider, chiefly:
- Cloud as a backup
Cloud as a backup
In this model, data plus the necessary software packages and configuration data (e.g. CMDB configuration data) for critical systems are backed up to 1 (or ideally more) hosts in the cloud. In the event of a disaster at the primary operating location, new virtual servers are commissioned and the configuration management tool (e.g. Puppet) is used to provision the servers with the appropriate software.
Recovery time is dependent upon the time taken to commission/provision the new virtual servers and perform any data transfer (or decryption). Database/file replication techniques can help to reduce the time to recover.
Failover to the cloud
For this model, there are pre-configured server instances in the cloud running the business critical systems combined with data replication. In the event of a disaster at the primary operating location, the systems are failed over to the cloud instance (this can be manual or use an automated ‘global load balancer’).
This form of hybrid cloud is fundamentally the same as the failover option, however there are more complexities involved in the data synchronisation, session failover etc.
So how can the Cloud help with planning Business Continuity activities?
1 & 2: People & premises
For knowledge worker businesses, the Internet and widespread availability of broadband has increased the prevalence of distributed home workers. With a distributed workforce and cloud-based systems these two are items of less concern; it remains a practical option to have staff work from home (or indeed a temporary serviced office) in the event of a disaster accessing systems running in the cloud.
For a true 'belt and braces' approach, 3G/HSDPA mobile broadband dongles can be used to provide a secondary Internet connection for home workers should their main internet connection be unavailable.
Infrastructure-as-a-Service is very compelling for providing a Business Continuity strategy for data centre(s) using the deployment models outlined above. Furthermore, VoIP services can provide sufficient telephony cover for small-medium businesses.
Cloud-based services (whether computing based or dedicated storage solutions e.g. Amazon S3) can aid your business in having current data stored confidentially and readily available in the event of a disaster. Some regulated organisations may have to consider whether the service provider can store data within the appropriate territory/jurisdiction. Data integrity is a further consideration for more complex system environments (particularly with the backup approach) - this needs to be taken into account for solution design / recovery procedures.
So we’ve established that cloud computing can alleviate some of the business continuity execution effort from your business, however you still need to plan properly (what happens if key personnel are unavailable; how will equipment & supplies be sourced) and perform due diligence on service providers to ensure that their SLAs and DR plans align with your needs.