The SQLite R-Tree extension can index spatial data, but it doesn’t actually understand "locations" like latitude/longitude or UTM coordinates out of the box; it only understands bounding boxes.
Let’s say you have a table of businesses, and you want to find all businesses within a certain radius of a given point.
CREATE TABLE businesses (
id INTEGER PRIMARY KEY,
name TEXT,
latitude REAL,
longitude REAL
);
INSERT INTO businesses (name, latitude, longitude) VALUES
('Cafe Aroma', 34.0522, -118.2437),
('Book Nook', 34.0530, -118.2440),
('Art Gallery', 34.0510, -118.2450),
('Park Entrance', 34.0550, -118.2420);
To use R-Tree for this, you first need to create a virtual table that mirrors your businesses table but includes R-Tree columns. The R-Tree module expects integer or floating-point coordinates for its bounding boxes. For a 2D spatial index, you’ll define two dimensions.
CREATE VIRTUAL TABLE businesses_rtree USING rtree(
id, -- Corresponds to the businesses.id
min_lon, -- Minimum longitude of the bounding box
min_lat, -- Minimum latitude of the bounding box
max_lon, -- Maximum longitude of the bounding box
max_lat -- Maximum latitude of the bounding box
);
Now, you need to populate businesses_rtree. Since we’re indexing points (businesses), the minimum and maximum coordinates for each dimension will be the same.
INSERT INTO businesses_rtree (id, min_lon, min_lat, max_lon, max_lat)
SELECT id, longitude, latitude, longitude, latitude FROM businesses;
To find businesses within a radius, say 0.01 degrees (which is roughly 1 km at the equator), you’d query the R-Tree. The R-Tree query (MATCH) finds all entries whose bounding boxes overlap with the search bounding box. For a point query, your search bounding box is a tiny box centered on your query point.
Let’s find businesses within 0.01 degrees of (34.0525, -118.2435):
SELECT b.name
FROM businesses b
JOIN businesses_rtree r ON b.id = r.id
WHERE r.min_lon BETWEEN -118.2435 - 0.01 AND -118.2435 + 0.01
AND r.min_lat BETWEEN 34.0525 - 0.01 AND 34.0525 + 0.01
AND r.max_lon BETWEEN -118.2435 - 0.01 AND -118.2435 + 0.01
AND r.max_lat BETWEEN 34.0525 - 0.01 AND 34.0525 + 0.01;
The MATCH operator is the R-Tree’s optimized search. It uses the R-Tree’s hierarchical bounding boxes to quickly prune the search space. The BETWEEN clauses define the search bounding box. The R-Tree will return all entries whose bounding boxes intersect with this search box.
The key insight is that R-Tree operates on axis-aligned bounding boxes, not on arbitrary shapes or even just points directly. You must translate your spatial concept (a point, a circle, a polygon) into one or more bounding boxes that cover it. For a point, the bounding box has zero width and height. For a circular search, you’d approximate it with a square bounding box.
The R-Tree module itself doesn’t perform the distance calculation; it only efficiently finds candidate rows whose bounding boxes overlap your search bounding box. You still need to perform a secondary, precise calculation (like Haversine for lat/lon or Euclidean for Cartesian) on the results to filter out false positives that the bounding box overlap might have included.
The R-Tree’s performance advantage comes from its ability to rapidly discard large portions of the index that cannot possibly contain results, based on the bounding box comparisons. This is significantly faster than a full table scan, especially for large datasets.
The R-Tree is particularly effective when your queries involve bounding box intersections or when you can accurately approximate your spatial queries with bounding boxes. For instance, finding all businesses within a rectangular region is a natural fit.
When you perform a query like the one above, the R-Tree first navigates its internal tree structure to find nodes whose bounding boxes intersect with your query bounding box. It then recursively descends into those nodes. For point data, where min_coord == max_coord, the R-Tree can be very efficient because each leaf node directly represents a point.
The MATCH operator in R-Tree is actually a shorthand for a series of comparisons on the min_lon, min_lat, max_lon, max_lat columns. The R-Tree internal logic uses these values to efficiently prune the search space.
If you update a business’s location, you must update both the businesses table and the businesses_rtree table. A common mistake is forgetting to update the R-Tree, leading to stale search results.
The next challenge you’ll encounter is performing radius searches more accurately, which involves implementing a distance calculation on the R-Tree’s results.