Vitess Lookup Vindexes let you build secondary indexes on sharded tables, and the most surprising thing is how they don’t require a full cluster re-sharding to implement.

Imagine you have a users table that’s sharded by user_id.

CREATE TABLE users (
    user_id BIGINT,
    user_name VARCHAR(100),
    email VARCHAR(100),
    PRIMARY KEY (user_id)
) ENGINE=InnoDB;

You want to look up users by their email address, but email isn’t your primary sharding key. A traditional secondary index would be inefficient in a sharded environment because finding a user by email would require scanning every shard.

This is where Lookup Vindexes shine. They create a separate, smaller table (the "lookup table") that maps the secondary index value (e.g., email) to the primary key of the actual data table (e.g., user_id). This lookup table can be sharded, but it’s sharded independently of your main users table.

Here’s how you’d set it up in Vitess. First, define a "lookup" table that stores the mapping:

CREATE TABLE user_email_idx (
    email VARCHAR(100) PRIMARY KEY,
    user_id BIGINT,
    -- This column is required for lookup vindexes. It's used to
    -- ensure the vindex is consistent with the target table.
    -- It should match the type of the primary key column in the target table.
    last_updated BIGINT
) ENGINE=InnoDB;

Next, you define the vindex in your Vitess keyspace schema. This tells Vitess how to use the user_email_idx table.

{
  "vindexes": {
    "email_lookup": {
      "type": "lookup",
      "params": {
        "table": "user_email_idx",
        "owner": "users"
      }
    }
  },
  "tables": {
    "users": {
      "column_vindexes": [
        {
          "column": "email",
          "name": "email_lookup"
        }
      ]
    }
  }
}

In this JSON:

  • "type": "lookup" specifies that this is a lookup vindex.
  • "params": {"table": "user_email_idx", "owner": "users"} tells Vitess which table to use for the lookup (user_email_idx) and which table it’s indexing (users).
  • "column_vindexes" on the users table associates the email column with the email_lookup vindex.

When you insert data into users, Vitess automatically populates user_email_idx.

-- Example insert into the main 'users' table
INSERT INTO users (user_id, user_name, email) VALUES (100, 'Alice', 'alice@example.com');

Vitess will then ensure that user_email_idx contains an entry like:

+-------------------+---------+------------+
| email             | user_id | last_updated |
+-------------------+---------+------------+
| alice@example.com |     100 | 1678886400 |
+-------------------+---------+------------+

Now, when you query by email:

SELECT user_name FROM users WHERE email = 'alice@example.com';

Vitess uses the email_lookup vindex. It first queries user_email_idx to get the user_id associated with 'alice@example.com'. Since user_email_idx is sharded (or can be), this lookup is fast. Once Vitess has the user_id, it can then efficiently route the query to the correct shard of the users table.

The last_updated column in the lookup table is crucial. Vitess uses it to verify that the entry in the lookup table is still valid and hasn’t been invalidated by a subsequent update or delete on the primary table. If last_updated on a lookup entry is older than the corresponding row in the primary table, Vitess knows the lookup entry is stale and will re-fetch the correct user_id from the primary table.

A common misconception is that lookup vindexes are just a simple join. They are more powerful because they can be sharded independently, and Vitess manages the consistency between the lookup table and the primary table transparently. This allows you to add secondary indexing capabilities to your sharded tables without the operational burden of re-sharding your entire dataset.

The real power comes when you realize that the lookup table itself can have its own vindex. You might, for instance, want to look up by email and then by user_name if email is not unique. This is achieved through chained vindexes.

Want structured learning?

Take the full Vitess course →