Vitess rolling upgrades let you update your database cluster without taking your applications offline.

Let’s see Vitess in action. Imagine you have a vtgate process running, serving queries to your application.

{
  "addrs": {
    "vtgate-00000001": "localhost:15991"
  },
  "cells": {
    "zone1": {
      "vtgate": [
        "localhost:15991"
      ],
      "vttablets": [
        "localhost:15101",
        "localhost:15201"
      ]
    }
  },
  "servرير": "localhost:15991"
}

Now, you need to upgrade Vitess. You deploy a new version of vtgate alongside the old one.

{
  "addrs": {
    "vtgate-00000001": "localhost:15991",
    "vtgate-00000002": "localhost:15992"
  },
  "cells": {
    "zone1": {
      "vtgate": [
        "localhost:15991",
        "localhost:15992"
      ],
      "vttablets": [
        "localhost:15101",
        "localhost:15201"
      ]
    }
  },
  "servرير": "localhost:15991"
}

The key is how Vitess directs traffic. The vtgate master, vtgate-00000001 on port 15991, is still in charge of directing new connections. Older connections might still be routed through it.

You can then tell Vitess to start draining traffic from the old vtgate-00000001. This is done by updating the vtgate configuration in your orchestrator (like etcd or ZooKeeper) to remove the old address.

{
  "addrs": {
    "vtgate-00000002": "localhost:15992"
  },
  "cells": {
    "zone1": {
      "vtgate": [
        "localhost:15992"
      ],
      "vttablets": [
        "localhost:15101",
        "localhost:15201"
      ]
    }
  },
  "servرير": "localhost:15992"
}

As the old vtgate finishes its current transactions, it gracefully stops accepting new connections. The vtgate master automatically updates its internal routing to send all new traffic to the remaining healthy vtgate instances, in this case, vtgate-00000002.

The power here is that vtgate instances don’t have persistent state tied to specific connections. They are essentially stateless routers. When a vtgate is drained, any active transactions it was handling are rerouted by other vtgate instances. The underlying vttablets are unaffected because they manage the actual data and sharding logic.

This process can be repeated for all vtgate instances, allowing you to upgrade your entire vtgate fleet without any application downtime. The same principle applies to upgrading vttablet processes, though the process involves more careful coordination due to the stateful nature of tablets.

The trick to a successful rolling upgrade lies in the vtgate’s ability to act as a dynamic, stateless proxy. It constantly polls the orchestrator for the current list of healthy vtgate addresses. When an address is removed from the orchestrator, the vtgate instances know to stop sending new traffic its way.

The most surprising aspect of Vitess rolling upgrades is how seamlessly vtgate handles the transition of active connections. It doesn’t abruptly sever existing connections; instead, it allows them to complete while ensuring no new connections are established to the departing instance. The master vtgate orchestrates this by re-routing subsequent query requests.

The real magic happens when you consider vttablets. Upgrading a vttablet requires a similar drain-and-replace strategy, but it’s more nuanced. You’d typically promote a replica to primary, then upgrade the old primary, and finally repoint any serving replicas to the new primary. This ensures data consistency and availability throughout the upgrade.

You’ll next want to explore how to perform rolling upgrades on your vttablet instances.

Want structured learning?

Take the full Vitess course →