TimescaleDB continuously aggregates are effectively materialized views that are automatically managed for you, but their refresh policies dictate when and how they catch up with the underlying hypertable.
Let’s see this in action. Imagine you have a conditions hypertable storing sensor readings:
CREATE TABLE conditions (
time TIMESTAMPTZ NOT NULL,
device INT NOT NULL,
temp NUMERIC
);
SELECT create_hypertable('conditions', 'time');
And you want a continuous aggregate conditions_summary to store the average temperature per device per hour:
CREATE MATERIALIZED VIEW conditions_summary
WITH (timescaledb.compress, timescaledb.compress_segmentby = 'device')
AS
SELECT
time_bucket('1 hour', time) AS hour,
device,
avg(temp) AS avg_temp
FROM conditions
GROUP BY hour, device;
Now, the crucial part: how does conditions_summary get updated? That’s where refresh policies come in. By default, a continuous aggregate has no refresh policy, meaning it never updates automatically. You’d have to manually trigger a refresh.
To make it automatic, we attach a refresh policy. The add_continuous_aggregate_policy function is your tool here. It takes several arguments:
materialized_view_name: The name of your continuous aggregate (e.g.,'conditions_summary').start_time: The earliest time that the policy should consider for refreshing.end_time: The latest time that the policy should consider for refreshing.schedule_interval: How often the policy should run. This dictates the frequency of the refresh check, not necessarily how much data is refreshed.if_not_exists: A boolean to prevent errors if the policy already exists.
Let’s set up a policy that refreshes the aggregate every 10 minutes, covering data that’s at least 5 minutes old up to the current time:
SELECT add_continuous_aggregate_policy('conditions_summary',
start_time => NOW() - INTERVAL '1 hour',
end_time => NOW(),
schedule_interval => INTERVAL '10 minutes',
if_not_exists => true);
With this policy, TimescaleDB will periodically check the conditions_summary materialized view. It will look at the data in the conditions hypertable that falls within the start_time and end_time bounds defined in the policy. If there’s new or updated data in conditions that hasn’t yet been incorporated into conditions_summary, the policy will trigger a refresh for that specific time range.
The schedule_interval is key. If schedule_interval is '10 minutes', TimescaleDB will wake up every 10 minutes to check if a refresh is needed. It won’t necessarily refresh 10 minutes of data; it will refresh whatever is missing within the start_time to end_time window. If data arrives in bursts, or if a refresh takes a long time, the aggregate might fall behind.
You can also define policies that are "catch-up" oriented. For example, to ensure that data older than 1 hour is always refreshed daily:
SELECT add_continuous_aggregate_policy('conditions_summary',
start_time => NOW() - INTERVAL '24 hours',
end_time => NOW() - INTERVAL '1 hour', -- Only refresh data that is at least 1 hour old
schedule_interval => INTERVAL '1 day',
if_not_exists => true);
Here, end_time is crucial. It tells the policy to focus on older data. The schedule_interval of INTERVAL '1 day' means the check happens once a day, but it will then process all missing data within the specified start_time and end_time.
The start_time and end_time of the policy are dynamic. They are evaluated when the policy runs. So, NOW() in start_time and end_time means the policy will always look at a sliding window relative to the current time.
The most counterintuitive aspect of continuous aggregate policies is that schedule_interval is about how often the scheduler checks, not how much data is refreshed. A policy with schedule_interval => '1 minute' might run every minute, but if no new data has arrived in the underlying hypertable, it will do nothing. Conversely, a policy with schedule_interval => '1 day' could still process many hours of missing data if it finally gets around to running and finds a large backlog. The start_time and end_time define the scope of the data that could be refreshed, and the scheduler’s frequency determines how promptly that scope is processed.
Understanding these policies is vital for managing the trade-off between data freshness in your aggregates and the computational cost of refreshing them.
After setting up and verifying your continuous aggregate refresh policies, the next logical step is to explore how to optimize the underlying continuous aggregate itself, perhaps through compression or partitioning strategies.