Vitess uses etcd as its topology store, which is crucial for managing its distributed nature.
Let’s see etcd in action with Vitess. Imagine a basic Vitess cluster with a single vtctld and two keyspaces: commerce and customer.
# Start a single etcd instance (for demo purposes)
ETCDCTL_API=3 etcdctl --endpoints=http://localhost:2379 snapshot save snapshot.db
# In a real setup, you'd have multiple etcd nodes for HA.
# Vitess configuration pointing to etcd
# In vtctl.conf or similar:
# etcd_topology_server = "etcd2:localhost:2379"
# Example of what you might see in etcd (using etcdctl)
etcdctl --endpoints=localhost:2379 get /vitess/ / --prefix
This etcdctl get command would reveal a hierarchical structure. At the root, you’d see a /vitess/ directory. Inside, you’d find entries like:
/vitess/global/vtctld/localhost:15999: Information about thevtctldserver./vitess/cell1/vtgate/vtgate-0000000001.localhost:15001: Details for avtgateinstance incell1./vitess/cell1/tablet/zone1-0000000001: Data for a tablet incell1./vitess/cell1/tablet/zone1-0000000002: Another tablet./vitess/keyspaces/commerce/shards: Information about shards for thecommercekeyspace./vitess/keyspaces/customer/shards: Information about shards for thecustomerkeyspace.
Vitess uses etcd to store the metadata that defines the state of your entire Vitess cluster. This includes the locations of all vtctld, vtgate, and tablet servers, the topology of your keyspaces and shards, and various other operational parameters. When a vtgate needs to route a query, it consults etcd to find the correct tablet for that shard. When a tablet needs to perform an operation, it registers itself with etcd.
The core problem Vitess solves is managing sharded, distributed relational databases at scale. etcd is the linchpin that allows Vitess to know where everything is and how it’s configured. Without a reliable topology store like etcd, Vitess wouldn’t be able to coordinate its distributed components.
The primary levers you control are the etcd endpoints Vitess connects to and the structure of your Vitess deployment (cells, keyspaces, shards). Vitess itself populates the etcd store as you add and configure your components.
The etcd topology server string is not just a list of endpoints; it defines a namespace within etcd that Vitess will use. For example, etcd2:zk1:2181,zk2:2181 tells Vitess to use the etcd2 protocol and look for keys under /vitess (the default prefix for etcd2 protocol) on zk1 and zk2. If you were to use etcd3:localhost:2379, Vitess would use the etcd3 protocol and look under /vitess (the default prefix for etcd3 protocol) on localhost:2379. Changing this prefix is a common way to isolate multiple Vitess clusters within the same etcd cluster, but it requires reconfiguring all Vitess components to point to the new prefix.
Understanding how Vitess serializes its topology data into etcd keys and values is essential for advanced debugging and custom tooling. For instance, the shard topology for a keyspace is stored as a ShardMap protobuf message, which can be retrieved from etcd, deserialized, and inspected to understand the exact shard boundaries and tablet assignments at a given moment.
The next concept to explore is how Vitess handles schema changes across a sharded environment, leveraging its topology information.