TimescaleDB’s automatic data dropping is actually a form of selective data deletion, not a true "garbage collection" that reclaims disk space immediately.

Let’s see it in action. Imagine you have a table measurements storing sensor readings, and you want to keep only the last 7 days of data.

-- Create a sample table
CREATE TABLE measurements (
    time TIMESTAMPTZ NOT NULL,
    device INT NOT NULL,
    temp FLOAT
);

-- Make it a hypertable
SELECT create_hypertable('measurements', 'time');

-- Add a retention policy
SELECT add_retention_policy('measurements', INTERVAL '7 days');

Now, measurements is a hypertable, and that add_retention_policy command tells TimescaleDB to start cleaning up. But what does "cleaning up" mean here? It means DELETE statements are being generated and executed in the background.

Here’s the mental model:

  1. Hypertables: TimescaleDB breaks down large time-series tables into smaller chunks (actual PostgreSQL tables) based on time intervals. This is fundamental. The create_hypertable command sets this up.
  2. Retention Policy: The add_retention_policy command doesn’t directly interact with the chunks. Instead, it sets a parameter (retention_period) for the hypertable.
  3. Background Worker: TimescaleDB has a background worker process. This worker periodically scans the hypertable’s metadata.
  4. Identifying Old Chunks: The worker checks each chunk’s time range. If a chunk’s entire time range falls before the current_time - retention_period, that chunk is a candidate for deletion.
  5. Dropping Chunks: When a chunk is identified as completely outside the retention window, the background worker issues a DROP TABLE command for that chunk. This is why it’s "auto-drop" and not "auto-delete." It drops the entire underlying PostgreSQL table.

The key is that it drops whole chunks. If a chunk spans from 2023-01-01 00:00:00 to 2023-01-08 00:00:00 and your retention is 7 days, and today is 2023-01-10, the worker will see that the entire chunk is older than 7 days (2023-01-10 - 7 days = 2023-01-03). So, it drops the chunk.

However, if the chunk spans 2023-01-05 00:00:00 to 2023-01-12 00:00:00, and today is 2023-01-10, the worker will not drop this chunk, even though part of its data is older than 7 days. The entire chunk’s time span must be older than the retention period.

The drop_chunks function is what the background worker calls. You can also call it manually: SELECT drop_chunks('measurements', older_than => NOW() - INTERVAL '7 days');. This is useful for testing or forcing a cleanup.

The actual deletion of old data happens when a chunk’s time range is completely outside the specified retention period. The background worker checks this periodically, and if a chunk is fully "expired," it’s dropped. This is very efficient because it’s a DROP TABLE operation, which is fast and doesn’t involve scanning rows to delete them individually.

This automatic dropping is crucial for managing disk space and query performance in long-running time-series databases. Without it, your database would grow indefinitely.

The next logical step is to understand how to manage the frequency at which TimescaleDB checks for and drops these old chunks, as the default might not be optimal for all workloads.

Want structured learning?

Take the full Timescaledb course →