TimescaleDB chunks are only excluded from queries when the query planner knows it doesn’t need them, which is almost never the case for an exclusion filter.

Let’s see this in action. Imagine we have a measurements table, partitioned by time into chunks.

CREATE TABLE measurements (
    time TIMESTAMPTZ NOT NULL,
    host TEXT NOT NULL,
    cpu DOUBLE PRECISION NULL
);

SELECT create_hypertable('measurements', 'time');

-- Insert some data
INSERT INTO measurements VALUES
(NOW() - interval '2 days', 'server1', 0.5),
(NOW() - interval '1 day', 'server2', 0.6),
(NOW(), 'server3', 0.7);

Now, let’s say we want to exclude data from server1 for some reason. A naive approach might be:

SELECT * FROM measurements WHERE host != 'server1';

If you run EXPLAIN ANALYZE on this, you’ll see something like this (simplified):

                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..10000.00 rows=1000 width=32)
   Workers Planned: 2
   ->  Parallel Seq Scan on measurements  (cost=0.00..9000.00 rows=500 width=32)
         Filter: (host <> 'server1'::text)

Notice Parallel Seq Scan on measurements. TimescaleDB, by default, doesn’t automatically know that the host != 'server1' filter should let it skip chunks. It has to scan all the chunks and apply the filter at the row level. This is inefficient if server1 data is a large portion of your dataset and you frequently query with this exclusion.

The solution is to use TimescaleDB’s chunk_exclude_predicate feature. This tells TimescaleDB exactly which chunks can be skipped based on a condition.

Let’s add a chunk_exclude_predicate to our measurements table. We’ll create a new table for clarity, but you could alter an existing one.

-- Create a new table with the exclusion predicate
CREATE TABLE measurements_excluded (
    time TIMESTAMPTZ NOT NULL,
    host TEXT NOT NULL,
    cpu DOUBLE PRECISION NULL
);

SELECT create_hypertable('measurements_excluded', 'time');

-- Add the exclusion predicate. This means if a chunk's data *only* contains 'server1',
-- it won't even be considered for queries that *don't* filter on host='server1'.
-- For this to be effective, we need to ensure that our data is organized such that
-- certain hosts tend to be in certain chunks, or that we can define a predicate that
-- applies to whole chunks.
-- A more practical example for exclusion is when a chunk's time range is no longer relevant,
-- or if a specific dimension's data is known to be entirely within a chunk.
-- Let's simulate a scenario where we want to exclude 'server1' data *if* it's the *only* host in a chunk.
-- This requires a bit more setup to demonstrate effectively, but the core idea is to
-- define a predicate that identifies entire chunks to be ignored.

-- For a more direct exclusion, let's consider a scenario where we know a chunk
-- contains *only* data from 'server1' and we want to exclude it from queries that
-- *don't* specifically ask for 'server1'. This requires us to be able to identify
-- such chunks. TimescaleDB's `chunk_exclude_predicate` is often used with
-- metadata or aggregated properties of a chunk.

-- A common and effective use case is excluding old data that is no longer relevant,
-- or data from a source that has been decommissioned.
-- Let's imagine we have a separate table to track hosts and their status.

CREATE TABLE hosts (
    hostname TEXT PRIMARY KEY,
    status TEXT
);

INSERT INTO hosts VALUES ('server1', 'decommissioned'), ('server2', 'active'), ('server3', 'active');

-- Now, we want to exclude data from 'decommissioned' hosts.
-- We need to associate this information with our chunks. TimescaleDB doesn't automatically
-- infer this. We need to add a column to our hypertable that can store this information
-- or use a mechanism that allows the predicate to be evaluated against chunk metadata.

-- The `chunk_exclude_predicate` is defined on the hypertable itself.
-- It's a boolean expression that, if true for a chunk, means that chunk can be excluded
-- from *any* query that does not explicitly override the exclusion.
-- To make this work, we need a way to evaluate the predicate against chunk properties.

-- Let's redefine our example to focus on a more direct exclusion strategy.
-- Suppose we have a column that indicates if the data in a chunk is "historical"
-- and should be excluded from typical operational queries.

CREATE TABLE measurements_with_tag (
    time TIMESTAMPTZ NOT NULL,
    host TEXT NOT NULL,
    cpu DOUBLE PRECISION NULL,
    is_historical BOOLEAN DEFAULT FALSE
);

SELECT create_hypertable('measurements_with_tag', 'time');

-- Insert some data
INSERT INTO measurements_with_tag (time, host, cpu, is_historical) VALUES
(NOW() - interval '10 days', 'server1', 0.5, TRUE),
(NOW() - interval '9 days', 'server1', 0.55, TRUE),
(NOW() - interval '2 days', 'server2', 0.6, FALSE),
(NOW() - interval '1 day', 'server3', 0.7, FALSE),
(NOW(), 'server4', 0.8, FALSE);

-- Now, let's add the exclusion predicate. This requires that the predicate can be
-- evaluated against the *chunk's metadata*. TimescaleDB doesn't automatically
-- scan all rows in a chunk to determine the value of `is_historical` for the entire chunk.
-- Instead, `chunk_exclude_predicate` is typically used when the predicate can be
-- evaluated based on the chunk's time range, or if you have a mechanism to set
-- chunk-level metadata.

-- A more direct and common use of `chunk_exclude_predicate` is for time-based
-- exclusion or when specific data dimensions are known to be segregated.
-- For instance, if you have a hypertable with a `device_id` column, and you know
-- that certain chunks contain data *only* from `device_id = 100`, you could
-- potentially exclude those chunks from queries that don't filter on `device_id = 100`.

-- However, the most straightforward and widely applicable use case for
-- `chunk_exclude_predicate` is to exclude chunks based on their *time range*.
-- Let's demonstrate that. Suppose we want to exclude data older than 5 days for
-- most queries.

-- First, we need to create a hypertable *without* the predicate initially,
-- insert data, and *then* add the predicate.
CREATE TABLE measurements_time_excluded (
    time TIMESTAMPTZ NOT NULL,
    host TEXT NOT NULL,
    cpu DOUBLE PRECISION NULL
);
SELECT create_hypertable('measurements_time_excluded', 'time');

-- Insert data spanning different time periods
INSERT INTO measurements_time_excluded VALUES
(NOW() - interval '15 days', 'serverA', 0.1),
(NOW() - interval '12 days', 'serverB', 0.2),
(NOW() - interval '8 days', 'serverC', 0.3),
(NOW() - interval '4 days', 'serverD', 0.4),
(NOW() - interval '2 days', 'serverE', 0.5),
(NOW() - interval '1 day', 'serverF', 0.6);

-- Now, let's add the exclusion predicate. This predicate will be evaluated
-- by TimescaleDB *before* it decides to scan a chunk.
-- We want to exclude chunks where the *maximum* time is older than 5 days.
-- TimescaleDB provides functions to access chunk metadata.
-- `chunk_max_time(chunk_id)` and `chunk_min_time(chunk_id)` are helpful.

-- First, let's find the chunks and their time ranges.
-- SELECT id, range FROM timescaledb_catalog.chunks WHERE table_name = 'measurements_time_excluded';
-- This will give us chunk IDs and their time ranges.

-- To apply the predicate, we need to associate it with the hypertable.
-- The predicate itself needs to be evaluable by the system.
-- The syntax for `chunk_exclude_predicate` is a SQL expression.
-- For example, to exclude chunks older than 5 days:

ALTER TABLE measurements_time_excluded SET (
    timescaledb.chunk_exclude_predicate = $$ time < NOW() - INTERVAL '5 days' $$
);
-- NOTE: The above syntax is a simplified representation. In reality, you'd need
-- to ensure the predicate is evaluated correctly against chunk properties or
-- use a mechanism that allows custom predicate evaluation.
-- The most common and robust way is to have a column in your hypertable that
-- reflects the property you want to filter on, and then ensure that column's
-- values are consistent within chunks.

-- Let's try a more concrete example using a tag that is consistent within a chunk.
-- Imagine we have a `region` column and we want to exclude data from the 'us-east-1' region.
-- To make `chunk_exclude_predicate` effective, the system needs to be able to
-- determine if a chunk *only* contains data matching the exclusion criteria.

-- A robust pattern:
-- 1. Add a column to your hypertable that represents the property you want to filter on (e.g., `region`, `is_archived`).
-- 2. Ensure that data within a chunk is *consistent* for this property, or that you can determine a representative value for the chunk.
-- 3. Create a view or a function that can evaluate the predicate against chunk metadata.

-- Simpler approach: Using a column that is guaranteed to be the same for all rows in a chunk.
-- Let's assume we are partitioning by `time` and `host`. If a chunk contains data
-- *only* for a specific host, and we want to exclude that host's data from general queries.

-- This is where `chunk_exclude_predicate` becomes powerful. It allows TimescaleDB
-- to *skip* entire chunks if the predicate evaluates to true.
-- When you run a query like `SELECT * FROM measurements_time_excluded WHERE time > NOW() - INTERVAL '3 days';`,
-- `EXPLAIN ANALYZE` will now show that certain chunks are being pruned.

-- Example of `EXPLAIN ANALYZE` output with an effective exclusion predicate:
/*
                                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..10000.00 rows=500 width=32)
   Workers Planned: 2
   ->  Parallel Append  (cost=0.00..8500.00 rows=250 width=32)
         ->  Parallel Bitmap Heap Scan on measurements_time_excluded_1_1_1_chunk  (cost=0.00..0.00 rows=0 width=32)
               Recheck Cond: (time > '2023-10-27 10:00:00+00'::timestamp with time zone)
               Filter: (time > '2023-10-27 10:00:00+00'::timestamp with time zone)
         ->  Parallel Bitmap Heap Scan on measurements_time_excluded_1_1_2_chunk  (cost=0.00..0.00 rows=0 width=32)
               Recheck Cond: (time > '2023-10-27 10:00:00+00'::timestamp with time zone)
               Filter: (time > '2023-10-27 10:00:00+00'::timestamp with time zone)
         ->  Parallel Bitmap Heap Scan on measurements_time_excluded_1_1_3_chunk  (cost=0.00..0.00 rows=0 width=32)
               Recheck Cond: (time > '2023-10-27 10:00:00+00'::timestamp with time zone)
               Filter: (time > '2023-10-27 10:00:00+00'::timestamp with time zone)
         ->  Parallel Bitmap Heap Scan on measurements_time_excluded_1_1_4_chunk  (cost=0.00..0.00 rows=0 width=32)
               Recheck Cond: (time > '2023-10-27 10:00:00+00'::timestamp with time zone)
               Filter: (time > '2023-10-27 10:00:00+00'::timestamp with time zone)
         ->  Parallel Bitmap Heap Scan on measurements_time_excluded_1_1_5_chunk  (cost=0.00..0.00 rows=0 width=32)
               Recheck Cond: (time > '2023-10-27 10:00:00+00'::timestamp with time zone)
               Filter: (time > '2023-10-27 10:00:00+00'::timestamp with time zone)
         ->  Parallel Bitmap Heap Scan on measurements_time_excluded_1_1_6_chunk  (cost=0.00..0.00 rows=0 width=32)
               Recheck Cond: (time > '2023-10-27 10:00:00+00'::timestamp with time zone)
               Filter: (time > '2023-10-27 10:00:00+00'::timestamp with time zone)
*/
-- In this hypothetical `EXPLAIN ANALYZE`, you'd see specific chunk scans. If the predicate
-- was effective, some chunks would simply not appear in the plan at all, or would be
-- marked as pruned. The key is that the query planner can determine, *before* scanning
-- any rows, that a chunk does not contain relevant data.

-- The actual mechanism for `chunk_exclude_predicate` relies on TimescaleDB's ability
-- to evaluate the predicate against chunk metadata. This metadata might be derived
-- from the range of values in the chunk for certain columns, or from specific
-- attributes you've set. For time-based exclusions, it often uses the chunk's
-- `min_time` and `max_time`.

-- If you add a `chunk_exclude_predicate` to an existing hypertable, TimescaleDB
-- will re-evaluate the predicate for all existing chunks. If a chunk now matches
-- the predicate, it will be marked as excluded for future queries.

-- The most common reason `chunk_exclude_predicate` doesn't seem to work is that
-- the predicate is too complex to be evaluated efficiently by TimescaleDB's planner,
-- or it requires scanning chunk data to determine its truthiness, defeating the purpose.
-- The predicate should ideally be evaluable using only the chunk's time range or
-- pre-computed metadata.

The next thing you'll likely encounter is ensuring your `chunk_exclude_predicate` is correctly defined to leverage TimescaleDB's metadata efficiently.

Want structured learning?

Take the full Timescaledb course →