For many cloud-enabled applications, multi-tenancy is a key design decision. There is a spectrum of choices from which to pick. Gartner includes seven different models ranging from “Isolated Tenancy” (aka sharing nothing) to “Shared Everything.” And really, it’s not one-size-fits-all. Selecting a multi-tenancy model mostly revolves around balancing two opposing aspects: the degree of sharing versus the level of security and customisation needed.
From the perspective of data architecture, these two aspects boil down to two basic questions: 1. Should each datastore contain information from more than one tenant? And, 2. Should all tenants use a shared data schema? The needs of isolation, security and maintenance drive the first decision; while the needs of customisability and flexibility drive the second. The implications of both of these decisions on cost have been extensively discussed.
Below, we show two examples of how requirements of isolation and the need for flexibility mandate different data multi-tenancy architectures. We focus on data architecture for brevity but recognise that there are a host of other factors, ranging from virtual machine collocation to application customisation, which one must consider for selection of multi-tenancy models.
Example 1 - Isolation requirements: Let’s start with applications that deal with private and sensitive data such as electronic medical/health records. In many countries, government regulations require that the patient’s private information is rigidly protected and accounted for, yet the records should be available whenever the patient (or an authorised third party) wants to access them. In some cases, they go so far as requiring the records be stored or made available in a standard schema (e.g., HL7). So, with regards to data, this type of application demands high level of security and isolation with little change in schema. Hence, these applications lean toward the models sporting one-tenant-per-database instance with a common-schema across all tenants.
This does not mean that hosting multiple tenant per database on shared schema cannot be used for applications with sensitive data, but using these will need additional security measures such as tenant-level view filtering and higher degree of data encryption. Any of these measures introduces a degree of complexity in the application layer.
Example 2 - Flexibility requirements: Consider enterprise applications like Customer Relationship Management System, Sales Force Automation or Human Resource Management System. These applications typically have update and query heavy operations. But, most notably, the complexity arises from the high level of customizability and flexibility needed to accommodate third party data (e.g., from suppliers).
This customisability factor tends to drive the decision toward separate schema per tenant which allows maximum flexibility. This model is the standard for the traditional enterprise applications like SAP and Siebel, where a set of base tables are customised through a set of extended tables, each row in the base table may refer to multiple extended table records, which, in turn, can store heavily customised data per tenant. However, the cost and complexity of developing, maintaining, and upgrading applications with this structure are fairly high. As such, many SaaS providers have largely adopted shared schema across all tenants.
However, to allow some degree of customisation, the structure has flexible tables where each record can be tenant-specific and have unique custom attributes. Rapid lookup and query on these structures use separate tables for indices (called Pivot tables). Tenant-specific metadata allows for further customisation, policy settings, and performance tuning. While loading tenant-specific application modules, the run-time engine (called kernels), makes use of the tenant-specific metadata to do necessary customisation.
Therefore, we find different application requirements drive different data multi-tenancy models. Indeed, horses for courses.
A parting thought: In the previous discussion, you would have noticed the degree of complexity the application must endure to access and manipulate multi-tenant data. The primary reason for this complexity is that today’s data management technologies lack the “native” tenant concept. Consider - if the tenancy concept was built into the database layer, then the database engine could bind a tenant’s request with a correct storage area and security privileges, and common business data could be separated from tenant’s specific information. The applications could run blissfully ignorant of the fact that they are running on top of a multi-tenant data model. This significantly simplifies the development and testing of multi-tenant applications. Any takers?