Common Table Expressions (CTEs) and recursive queries are the standard SQL way to traverse and manipulate hierarchical data, but their power lies in understanding how the recursion actually unfolds, not just in their syntax.
Let’s see this in action. Imagine a simple organizational hierarchy:
WITH RECURSIVE EmployeeHierarchy AS (
-- Anchor member: Start with the top-level employee (CEO)
SELECT
employee_id,
employee_name,
manager_id,
0 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive member: Join employees to their managers
SELECT
e.employee_id,
e.employee_name,
e.manager_id,
eh.level + 1
FROM employees e
JOIN EmployeeHierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM EmployeeHierarchy ORDER BY level, employee_id;
Here’s a sample employees table:
| employee_id | employee_name | manager_id |
|---|---|---|
| 1 | Alice (CEO) | NULL |
| 2 | Bob | 1 |
| 3 | Charlie | 1 |
| 4 | David | 2 |
| 5 | Eve | 2 |
| 6 | Frank | 3 |
Running the query above would produce this result:
| employee_id | employee_name | manager_id | level |
|---|---|---|---|
| 1 | Alice (CEO) | NULL | 0 |
| 2 | Bob | 1 | 1 |
| 3 | Charlie | 1 | 1 |
| 4 | David | 2 | 2 |
| 5 | Eve | 2 | 2 |
| 6 | Frank | 3 | 2 |
This query effectively walks down the management chain. The WITH RECURSIVE EmployeeHierarchy AS (...) defines a temporary result set that can refer to itself. The EmployeeHierarchy CTE is defined by two parts: an anchor member (the SELECT before UNION ALL) and a recursive member (the SELECT after UNION ALL).
The anchor member provides the starting point for the recursion. In this case, it selects the employee(s) with no manager, typically the top of the hierarchy (the CEO). It also assigns a level of 0 to this starting point.
The recursive member is what drives the iteration. It joins the employees table (e) with the current state of the EmployeeHierarchy CTE (eh). The join condition e.manager_id = eh.employee_id means "find employees whose manager is one of the employees we’ve already found in the previous step." For each employee found this way, it increments the level by 1, signifying one step further down the hierarchy. This process repeats until the recursive member returns no new rows, meaning there are no more employees to find at a deeper level.
The UNION ALL operator combines the results of the anchor member with the results of each iteration of the recursive member. It’s crucial that UNION ALL is used and not UNION, as UNION would deduplicate rows, which can lead to incorrect results or infinite loops if not handled carefully in more complex scenarios.
The final SELECT * FROM EmployeeHierarchy then retrieves all the rows accumulated by the CTE across all recursive steps. Ordering by level and employee_id helps visualize the hierarchical structure.
The real magic is how SQLite (and other SQL databases) manage the state of the CTE. It doesn’t just run the recursive part once. Think of it as a loop:
- Iteration 0 (Anchor): The anchor query runs, producing an initial set of rows (the CEO). This set becomes the "current" result of
EmployeeHierarchy. - Iteration 1 (Recursion): The recursive query runs, using the "current" result from Iteration 0. It finds Bob and Charlie. These new rows are added to the overall result. The set of Bob and Charlie becomes the "current" result for the next iteration.
- Iteration 2 (Recursion): The recursive query runs again, this time using Bob and Charlie as the "current" result. It finds David and Eve (managed by Bob) and Frank (managed by Charlie). These new rows are added. David, Eve, and Frank become the "current" result.
- Iteration 3 (Recursion): The recursive query runs with David, Eve, and Frank. It finds no employees whose
manager_idmatches any of these IDs. The recursive member returns zero rows. - Termination: Since the recursive member returned no new rows, the recursion stops. The complete result set is the union of all rows produced in Iterations 0, 1, and 2.
The level column isn’t strictly necessary for the recursion itself but is invaluable for understanding and processing the hierarchy. You can use it to limit the depth of recursion (e.g., WHERE eh.level < 5) or to perform operations specific to a certain depth. You could also use it to find all direct reports of a specific manager by filtering WHERE manager_id = <specific_manager_id> in the anchor or recursive part, or to find all subordinates by querying the final result.
A common pattern is to use this to flatten relationships or to perform aggregations up or down a tree. For instance, to find all employees reporting (directly or indirectly) to Alice, you’d simply select employee_id and employee_name from the EmployeeHierarchy CTE where level > 0.
The most surprising thing about recursive CTEs is how they handle cycles. If your data has a circular reference (e.g., employee A manages B, and B manages A), the query will by default enter an infinite loop. Most SQL implementations, including SQLite, have a mechanism to prevent this, often by limiting the maximum recursion depth or by detecting cycles. In SQLite, you can explicitly set a MAX_RECURSION limit in your query to prevent runaway processes.
The next challenge is often performing aggregations across these hierarchical structures, such as calculating the total salary cost for each department or finding the longest reporting chain.