TimescaleDB’s automatic hypertables can compress historical data by aggregating it into coarser time granularities, a process called "rollup" or "downsampling."

Let’s see it in action. Imagine you have a metrics hypertable storing second-by-second sensor readings:

CREATE TABLE metrics (
    time TIMESTAMPTZ NOT NULL,
    device_id INT NOT NULL,
    temperature DOUBLE PRECISION
);

SELECT create_hypertable('metrics', 'time');

You’re collecting data for months, and querying raw second-level data for long periods is slow. You want to keep daily averages for recent data, but maybe weekly averages for older data.

TimescaleDB’s continuous aggregates are the tool for this. They are essentially materialized views that automatically update as new data arrives in the source hypertable.

Here’s how you’d create a continuous aggregate to store daily averages:

CREATE MATERIALIZED VIEW metrics_daily
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('1 day', time) AS time,
    device_id,
    avg(temperature) AS avg_temperature
FROM metrics
GROUP BY time_bucket('1 day', time), device_id;

The WITH (timescaledb.continuous) clause is key. It tells TimescaleDB to manage this view. Now, when you insert data into metrics, TimescaleDB automatically calculates and stores the daily average in metrics_daily.

To query the daily averages, you query the metrics_daily view:

SELECT * FROM metrics_daily WHERE device_id = 1 ORDER BY time DESC LIMIT 10;

This is much faster than querying the raw metrics table for the same period.

Now, for older data, say data older than 30 days, you might only need weekly averages. You can create another continuous aggregate for this:

CREATE MATERIALIZED VIEW metrics_weekly
WITH (timescaledb.continuous) AS
SELECT
    time_bucket('7 days', time) AS time,
    device_id,
    avg(temperature) AS avg_temperature
FROM metrics
WHERE time < NOW() - INTERVAL '30 days' -- Only consider data older than 30 days
GROUP BY time_bucket('7 days', time), device_id;

Notice the WHERE clause. This allows you to create different granularities for different age ranges of data. The time_bucket function is crucial here; it groups your timestamps into specified intervals.

To manage how often these continuous aggregates refresh, you use set_chunk_time_interval() on the continuous aggregate itself. This determines how far back in time the aggregate is kept up-to-date. For metrics_daily, you might want it to be up-to-date for the last 7 days:

SELECT set_chunk_time_interval('metrics_daily', INTERVAL '7 days');

This tells TimescaleDB to ensure that the metrics_daily view is always populated with data for the last 7 days. Data older than that will still exist in the view, but it won’t be automatically updated.

The real power comes when you combine this with TimescaleDB’s data retention policies and compression. You can set up a policy to drop raw data older than a certain point, knowing that its aggregated form is safely stored in your continuous aggregates. You can also compress the raw data and even the continuous aggregates themselves for further storage savings.

The time_bucket function can take an offset argument, which is often misunderstood. If you have data timestamps that are precisely at the start of an hour (e.g., 2023-10-27 10:00:00), and you use time_bucket('1 hour', time), the resulting bucket will be 2023-10-27 10:00:00. If you instead want your buckets to end at the hour mark (e.g., 2023-10-27 11:00:00 represents the hour from 10:00 to 11:00), you’d use an offset: time_bucket('1 hour', time, '1 hour'). This is particularly useful for aligning aggregated data with specific calendar boundaries.

The next step in optimizing historical data is often to investigate data tiering, moving older, less frequently accessed data to cheaper storage.

Want structured learning?

Take the full Timescaledb course →