Vitess uses a "cell" topology to manage its distributed database clusters across different geographical regions or data centers.
Let’s see Vitess in action with a multi-region setup. Imagine we have two regions, us-east-1 and eu-west-1, and we want our Vitess cluster to span both.
# Example vttablet flags to run a tablet in us-east-1
vttablet \
--cell us-east-1 \
--tablet_hostname my-us-east-1-tablet-01 \
--grpc_port 15999 \
--http_port 15001 \
--service_map grpc-queryservice,grpc-schemadiffservice,grpc-tabletmanager \
--topo_backend etcd2 \
--topo_address 10.0.0.1:2379,10.0.0.2:2379 \
--topo_cell_alias us-east-1 \
--db_driver mysql \
--db_host localhost \
--db_port 3306 \
--db_name vt_app_db
# Example vttablet flags to run a tablet in eu-west-1
vttablet \
--cell eu-west-1 \
--tablet_hostname my-eu-west-1-tablet-01 \
--grpc_port 15999 \
--http_port 15001 \
--service_map grpc-queryservice,grpc-schemadiffservice,grpc-tabletmanager \
--topo_backend etcd2 \
--topo_address 10.0.1.1:2379,10.0.1.2:2379 \
--topo_cell_alias eu-west-1 \
--db_driver mysql \
--db_host localhost \
--db_port 3306 \
--db_name vt_app_db
In this example, we’re telling two vttablet instances that they belong to different cells, us-east-1 and eu-west-1. Each cell has its own set of topology servers (here, etcd2 instances at different addresses). Vitess uses these cell definitions to understand where its components are located.
The core problem Vitess’s cell topology solves is managing distributed transactions and query routing across geographically dispersed data centers or cloud regions. Without this, a Vitess cluster wouldn’t know how to reliably direct traffic or ensure data consistency when components are physically separated. It allows Vitess to treat a collection of independent MySQL instances, managed by separate topology services, as a single logical database.
Internally, Vitess uses a CellInfo proto stored in its topology service to define each cell. This proto contains the addresses of the topology service for that specific cell. When a component (like a vtgate or another vttablet) needs to interact with a component in a different cell, it consults its local topology service. If the target component is in a different cell, Vitess will then query the topology service of that remote cell to find its address. This hierarchical lookup is key.
For multi-region setups, you’ll typically run a dedicated topology service (like etcd or ZooKeeper) within each region. The vtctld instances that manage the cluster are often deployed in a primary region but need to be aware of other cells. vtgate instances, which are the query routing layer, are usually deployed in every region where you want to serve traffic, and they need to know about all the cells in the topology.
The topo_cell_alias flag on vttablet is crucial for associating a specific tablet with its logical cell. If this is missing or incorrect, the tablet might not be discoverable by vtgate instances in other cells, or it might be incorrectly registered, leading to routing issues.
When you have a multi-region setup, the vtctld instances in your primary cell will be aware of all the other cells defined in the topology. They can then manage operations like resharding or schema changes across these cells. However, vtgate instances are the ones that actually perform the cross-cell routing. A vtgate instance in us-east-1 that receives a query for a keyspace sharded across us-east-1 and eu-west-1 will consult its local topology service to find the appropriate vttablet in either cell. If the target is in eu-west-1, vtgate will then query eu-west-1’s topology service to get the address of the vttablet responsible for that shard.
The mechanism for cross-cell communication relies on vtgate being able to resolve the topology service for any given cell. This is why each cell needs its own operational topology service. The topo_address flag for the topology backend is what vtgate and vtctld use to connect to the topology service. If your vtgate is in us-east-1 and needs to talk to a vttablet in eu-west-1, it uses its local topology service to find the CellInfo for eu-west-1. That CellInfo contains the addresses of eu-west-1’s topology service. vtgate then connects to that topology service to find the specific vttablet’s address.
A common pitfall is configuring vtgate to only point to its local topology service. For vtgate to be truly multi-region aware, it needs to be able to reach all topology services defined in the cluster. This is typically achieved by providing a comma-separated list of all topology server endpoints to the topo_address flag when starting vtgate if you are using a single, global topology server, or by ensuring vtgate can reach the local topology server for each cell it needs to interact with. The latter is more common in true multi-region setups where each region has its own distinct topology ensemble.
The next step in managing a multi-region Vitess deployment is understanding how vtgate handles query routing and latency optimization across these cells.