Vitess has a secret: it’s not a magic wand for all distributed queries.
Imagine you have a sharded database, say split by user_id. You’ve got tables like users (sharded by user_id), orders (sharded by user_id), and products. You want to see all orders for a specific user, which is easy. You want to see all products, also easy. But what about seeing all orders across all users, or all products and all orders? That’s where Vitess hits some limitations, and understanding them is key to avoiding performance headaches.
Let’s say you have a users table, sharded by user_id, and an orders table, also sharded by user_id.
-- This is fine, Vitess knows how to route this to the correct shard
SELECT * FROM users WHERE user_id = 12345;
-- This is also fine, Vitess knows how to route this to the correct shard
SELECT * FROM orders WHERE user_id = 12345;
Now, consider a query that cannot be satisfied by a single shard, like fetching all orders without a WHERE clause on user_id.
-- This is a problem!
SELECT * FROM orders;
Vitess can’t just broadcast SELECT * FROM orders to every shard and then merge the results efficiently. It could, but it’s generally a bad idea. Why? Because Vitess’s primary strength is routing queries to specific shards based on the SHARD KEY (in this case, user_id). When you omit the SHARD KEY from a WHERE clause on a sharded table, Vitess doesn’t know which shard(s) to ask.
The most common and problematic scenario is a query that would require scanning every shard of a sharded table. Vitess will reject these queries to prevent accidental, massive performance degradation.
Here’s what generally won’t work:
-
Full Table Scans on Sharded Tables: If a table is sharded by
user_id, a query likeSELECT * FROM orders;orSELECT COUNT(*) FROM orders;will fail. Vitess doesn’t have a shard key to route this query.- Diagnosis: You’ll see an error like
vttablet: could not find shard for table orders. - Fix: Add a
WHEREclause that includes the shard key (user_id). If you truly need to scan the whole table, you’ll need to rethink your data model or use a different tool for analytical workloads. - Why it works: Vitess uses the shard key to determine which shard(s) hold the relevant data. Without it, it can’t make that routing decision.
- Diagnosis: You’ll see an error like
-
Queries Involving Joins Across Sharded Tables Without a Common Shard Key: If you join
orders(sharded byuser_id) withproducts(which might be unsharded or sharded differently), and the join condition isn’t onuser_id, Vitess might struggle. A query likeSELECT o.*, p.name FROM orders o JOIN products p ON o.product_id = p.id;whereordersis sharded byuser_idandproductsis not sharded or sharded byproduct_id.- Diagnosis: Vitess might try to push down the join to shards, but if the
product_idisn’t present or easily joinable on theuser_idshards, it will fail. You’ll often see errors indicating aJOINcondition is missing a shard key or that aJOINis not supported across disparate shards. - Fix: Ensure your
JOINconditions align with shard keys where possible. Ifproductswere also sharded byuser_id, this join would be simpler. Alternatively, denormalize data or use a separate analytical system. - Why it works: Vitess prefers to execute joins within a shard if possible. When data for the join resides on different shards (and isn’t easily correlated by the shard key), Vitess cannot efficiently perform the join without significant cross-shard communication, which it tries to avoid.
- Diagnosis: Vitess might try to push down the join to shards, but if the
-
GROUP BYon Non-Shard-Key Columns Without aWHEREClause: Similar to full table scans, if you try toGROUP BYa column that isn’t the shard key on a sharded table, Vitess might reject it. For example,SELECT COUNT(*) FROM orders GROUP BY order_date;whereordersis sharded byuser_id.- Diagnosis: Error messages similar to full table scans, indicating Vitess cannot route the
GROUP BYoperation. - Fix: Either add a
WHERE user_id = ...clause to limit the scope to a single shard, or consider if this aggregation is truly necessary in a transactional system. For analytics, use a different tool. - Why it works: Aggregations like
GROUP BYrequire collecting data. Without a shard key, Vitess doesn’t know where to collect it from. It can’t distribute theGROUP BYoperation effectively across shards for arbitrary columns.
- Diagnosis: Error messages similar to full table scans, indicating Vitess cannot route the
-
ORDER BYon Non-Shard-Key Columns Without aWHEREClause: Similar toGROUP BY,ORDER BYon a non-shard-key column without a shard key in theWHEREclause can be problematic.SELECT * FROM orders ORDER BY order_date DESC;- Diagnosis: Vitess might refuse to execute this to prevent a full table scan and subsequent sort.
- Fix: Add a
WHERE user_id = ...clause. If you need global ordering, Vitess isn’t the right tool for that specific query. - Why it works: Sorting requires all data to be brought together. If Vitess doesn’t know which shard has the data, it can’t perform the sort efficiently.
-
Using
UNSHARDEDTables with Sharded Tables in Complex Joins: If you have ashardedtable and anunshardedtable, Vitess can often handle joins where theunshardedtable is on one side. However, if the query becomes too complex or Vitess can’t optimize it down to a single shard operation, it might fail.- Diagnosis: Errors related to
JOINoperations not being resolvable or requiring cross-shard operations that are disallowed. - Fix: Ensure your
JOINconditions are efficient. Sometimes, denormalizing data from theunshardedtable into theshardedtable (if feasible) can improve performance and allow Vitess to process the query within a single shard. - Why it works: Vitess aims to push operations down to individual shards. While it can join a sharded table with an unsharded one (by broadcasting the unsharded table to each shard, or fetching specific rows from the unsharded table), complex conditions can exceed its capabilities.
- Diagnosis: Errors related to
-
Queries Requiring
UNION ALLAcross Shards: If you need to combine results from multiple shards for a sharded table, Vitess generally won’t allow a directUNION ALLquery likeSELECT * FROM orders WHERE user_id < 100 UNION ALL SELECT * FROM orders WHERE user_id >= 100;ifuser_idis the shard key and the ranges don’t align perfectly with shard boundaries.- Diagnosis: Vitess will likely reject this, stating it cannot route the query or that it requires a full scan.
- Fix: Use Vitess’s
vtgatefunctionality to make multiple distinct queries to different shards and combine the results in your application, or use a different tool for this type of aggregation. - Why it works: Vitess’s routing mechanism is designed for single-shard operations or operations that can be mapped to specific shards.
UNION ALLacross arbitrary ranges of a sharded table forces Vitess to act like a traditional database, which it’s not optimized for in this distributed context.
The underlying principle is that Vitess excels at routing queries based on a shard key. When your query doesn’t provide enough information to determine which shard(s) to query, or when it implies a full scan or complex cross-shard aggregation, Vitess will typically throw an error to prevent unintended performance issues.
The next hurdle you’ll likely encounter is understanding how to effectively use Vitess for analytical queries, which often involves offloading data to specialized systems.