Understanding database multitenancy
Filipe Ximenes • 5 July 2016
Let's imagine we are going to build a system with the following constraints:
- Corporate clients will be using it;
- The data that will be stored in it is very important and confidential. It cannot leak from one customer to another;
- It should be able to scale for hundreds of clients;
This is the type of system where we want to employ a multitenant architecture. Here is the definition in the words of Wikipedia:
The term "software multitenancy" refers to a software architecture in which a single instance of software runs on a server and serves multiple tenants. A tenant is a group of users who share a common access with specific privileges to the software instance.
Cool, now that we understand the concept, let's explore a little more how a multitenant system works so we can make good decisions when architecting our next project.
The first thing it's important to notice is the mention of "single instance of software" in the wikipedia definition. It is important to highlight this because it clears that the option of deploying multiple servers each one with its own copy of the codebase and a database does not characterise a multitenant system. In any case this architecture is not an option to us because it's not compatible with the third constraint we defined.
Here are some of the terms we will be using across this blog post for quick reference:
- Tenant - In our context this can be directly translated to "Customer".
- Schema - A 'blueprint' of how database tables relate to each other.
That said, there are 3 traditional ways to architect a multitenant system:
1 - Multiple databases
Although this would be considered a valid multitenant approach, deploying multiple databases caries almost all the problems regarding scalability and maintenance as the "one server per tenant" approach. The up side is that you will be maintaining a single codebase and routing each client to it's own private and isolated database.
Benefits of this approach include you being able to keep data from each customer 'physically' (not literally) isolated from each client and therefore more secure. Downside: you will need to provision a new database for every new client and make sure you connect to the correct address before making queries.
2 - Single shared database schema
In this approach we will have only one deployed database. In the database there will be a tenant table and all tables will be directly or indirectly linked to this table.
Having all your data in a single schema might be a good idea. It's easier to make data aggregations and do system wide queries. The downside is that you will need to be very careful when querying data to display to each customer. Every query will need an additional filter to guarantee it only returns results regarding the desired tenant, otherwise data will leak and we will be in trouble for exposing private information.
3 - Single database, multiple schemas
Most of the popular relational databases available support the creation of multiple schemas inside a single database. This is almost like having multiple sub-databases. Each schema is isolated from the others. Since they are not tied to each other they can even evolve differently. To model our initial system, we would place the tenant table in the database shared schema (let's call it public schema). For each entry in the tenant table we will create a new schema and store its name. With the schema name we will be able to make scoped queries to each tenant schema.
This approach has the benefits of having a single shared database (which makes it easier to maintain and scale) and at the same time keep data from each client isolated. Development also becomes a lot easier since you will not need to have additional scope filters in every query.
This was an introductory walkthrough to the subject. I hope now you have a clear understanding about what is a multitenant architecture and the approaches you can take to archive it.