Vitess’s multi-tenant isolation, specifically using per-tenant keyspaces, is surprisingly more about enforcing isolation than creating it from scratch.

Let’s watch a tenant get provisioned and start serving traffic. Imagine a Vitess cluster already running, managed by vtctld. We have a vtgate gateway and several vtworker and vttablet processes.

First, we need to tell Vitess about our new tenant, "acme". This is done via the vreplication tool, which is also the primary mechanism for moving data between shards and keyspaces. We’ll execute a command like this:

vtctlclient --server=vtctld-0.vtctld.prod:15999 ApplyVSchema --keyspace=acme --vschema='{
    "sharded": true,
    "vstreams_table": "vstreams",
    "tables": {
        "users": {
            "column_vindexes": {
                "user_id": "numeric_vindex"
            },
            "auto_increment": {
                "column": "user_id",
                "increment": 1000000000000000000
            }
        }
    },
    "vindexes": {
        "numeric_vindex": {
            "type": "numeric"
        }
    }
}'

This command defines the acme keyspace. sharded: true means data within this keyspace will be split across multiple shards. We’ve also specified a vindex named numeric_vindex of type numeric, which is crucial for sharding. The tables section describes our users table, and importantly, column_vindexes links the user_id column to our numeric_vindex. This tells Vitess that queries filtering or joining on user_id should use this vindex to determine which shard the data resides in. The auto_increment clause is also important for generating unique IDs within this tenant’s scope.

Now, Vitess needs to know how to shard this keyspace. This is done by creating a VReplication workflow. We’ll tell Vitess to move a "source" table (which might be a placeholder or a template) into the acme keyspace, sharded by user_id.

vtctlclient --server=vtctld-0.prod:15999 ExecuteVtctlCommand \
--request='{ "workflow": "acme.users_shard_by_user_id", "source_keyspace": "external_templates", "source_tables": ["users"], "target_keyspace": "acme", "cell": "zone1", "shard_count": 4, "strategy": "source", "vindex": "numeric_vindex" }'

This is where the magic happens. Vitess, via vtworker and vttablet, will:

  1. Create Shards: It will create the specified number of shards (e.g., 4) for the acme keyspace. Each shard will get its own set of vttablet processes.
  2. Create Tables: It will create the users table schema (and any other tables defined in the VSCHEMA) on each of these new shards.
  3. Configure VIndexes: Crucially, it will set up the numeric_vindex for the acme keyspace, mapping ranges of user_id values to specific shards.
  4. Generate VStreamer: It starts a vstreamer process for the users table. This process watches the primary users table in the external_templates keyspace (or wherever the source data is) and streams changes.
  5. Replicate Data: vreplication workers then consume these streams and apply the changes to the users table on the appropriate shards within the acme keyspace.

The result is that the acme keyspace is now sharded, and all data for tenant acme resides exclusively within these shards. When a query comes into vtgate for acme and targets users with a WHERE user_id = 12345, vtgate consults the acme VSCHEMA, uses the numeric_vindex to determine that user_id = 12345 belongs to, say, shard zone1-0, and routes the query directly to the vttablet responsible for that shard. This prevents cross-shard queries for tenant-specific data and ensures that one tenant’s data is physically separated from another’s.

The most surprising thing is how vreplication is used not just for migrations, but as the engine for provisioning and sharding new tenant keyspaces. It’s a single, powerful tool for data movement, schema management, and tenant isolation setup.

Now, imagine you need to add another tenant, "beta". You’d repeat the ApplyVSchema and ExecuteVtctlCommand steps, but with keyspace: beta. Vitess handles the creation of a completely independent set of shards and tables for this new tenant.

The next logical step is understanding how to manage the lifecycle of these tenant keyspaces, especially in terms of scaling and data lifecycle.

Want structured learning?

Take the full Vitess course →