Continuous aggregates in TimescaleDB pre-compute query results for specific time intervals, making queries on large datasets much faster.
Let’s see this in action. Imagine you have a massive table of sensor readings, sensor_readings, with columns like time (a timestamp), device_id (an integer), and temperature (a float). You frequently need to find the average temperature per device per hour.
-- Original table structure
CREATE TABLE sensor_readings (
time TIMESTAMPTZ NOT NULL,
device_id INT NOT NULL,
temperature FLOAT
);
-- Add a hypertable for time-series data
SELECT create_hypertable('sensor_readings', 'time');
-- Insert some sample data (imagine millions of rows here)
INSERT INTO sensor_readings (time, device_id, temperature) VALUES
('2023-10-27 10:00:00+00', 1, 22.5),
('2023-10-27 10:01:00+00', 1, 22.6),
('2023-10-27 10:00:00+00', 2, 25.1),
('2023-10-27 10:02:00+00', 1, 22.7);
-- A typical query without continuous aggregates
SELECT
time_bucket('1 hour', time) AS hour,
device_id,
AVG(temperature) AS avg_temperature
FROM sensor_readings
WHERE time >= '2023-10-27 00:00:00+00' AND time < '2023-10-28 00:00:00+00'
GROUP BY hour, device_id
ORDER BY hour, device_id;
Running this query repeatedly on a large dataset can be slow because the database has to scan and aggregate the raw data every single time. This is where continuous aggregates come in. They are essentially materialized views that automatically update.
Here’s how you’d create a continuous aggregate for that hourly average:
-- Create a continuous aggregate
CREATE MATERIALIZED VIEW sensor_readings_hourly_avg
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 hour', time) AS hour,
device_id,
AVG(temperature) AS avg_temperature
FROM sensor_readings
GROUP BY hour, device_id;
-- Add the continuous aggregate to the hypertable
SELECT add_continuous_aggregate('sensor_readings_hourly_avg', '1 hour');
Now, when you query sensor_readings_hourly_avg, you’re querying pre-computed results.
SELECT * FROM sensor_readings_hourly_avg
WHERE hour >= '2023-10-27 00:00:00+00' AND hour < '2023-10-28 00:00:00+00'
ORDER BY hour, device_id;
This query will be orders of magnitude faster because it doesn’t need to re-process the raw sensor_readings data. TimescaleDB has a background worker (the "continuous aggregate job") that periodically runs the aggregation query and updates the sensor_readings_hourly_avg materialized view. The add_continuous_aggregate function tells TimescaleDB which chunks of the underlying hypertable this view should cover, and it also helps the background job efficiently update only the relevant data.
The key benefit is that the time_bucket function, along with the aggregation, is executed periodically by TimescaleDB’s background process, not on every user query. The materialized view sensor_readings_hourly_avg stores the results of these periodic computations. When you query the materialized view, TimescaleDB simply reads from these pre-computed results, bypassing the need to scan and aggregate the raw sensor_readings table. The WITH (timescaledb.continuous) clause is what marks it as a continuous aggregate, enabling the automatic background updates.
The "bucket" size specified in time_bucket and passed to add_continuous_aggregate (e.g., '1 hour') determines the granularity of the pre-computation. TimescaleDB will compute and store aggregates for each unique hour in your data. The add_continuous_aggregate function, when called after the CREATE MATERIALIZED VIEW statement, informs TimescaleDB about the associated aggregation, allowing it to manage the background updates efficiently. It essentially registers this materialized view as one that needs continuous maintenance based on the specified time interval.
The most surprising thing about continuous aggregates is how they handle out-of-order data. If new data arrives that falls into a time bucket that has already been processed by the background job, TimescaleDB doesn’t just re-aggregate the entire bucket from scratch. Instead, it intelligently identifies the specific rows that have changed or are new within that bucket and re-calculates only the necessary parts of the aggregation. This is achieved by tracking the max_time for which each chunk has been processed. When the background job runs, it looks at the max_time of the underlying hypertable and only processes data up to that point, re-aggregating any buckets that might have been affected by new or updated data within that range.
The next step is to explore how to combine multiple continuous aggregates or to query across different time granularities using them.