You can join a TimescaleDB hypertable with a regular PostgreSQL table, but it’s not as straightforward as you might think because TimescaleDB’s internal optimizations for hypertables don’t automatically extend to the joined regular table.

Let’s say you have a measurements hypertable and a devices regular table.

CREATE TABLE devices (
    device_id SERIAL PRIMARY KEY,
    device_name VARCHAR(100)
);

CREATE TABLE measurements (
    time TIMESTAMPTZ NOT NULL,
    device_id INT,
    temperature DOUBLE PRECISION
);

SELECT create_hypertable('measurements', 'time');

Now, you want to join them to get the temperature readings with their corresponding device names. A naive join might look like this:

SELECT
    m.time,
    d.device_name,
    m.temperature
FROM
    measurements m
JOIN
    devices d ON m.device_id = d.device_id
WHERE
    m.time >= '2023-10-26 00:00:00' AND m.time < '2023-10-27 00:00:00';

This query will work, but if your measurements hypertable is massive, and your query filters on time, PostgreSQL might still end up scanning a huge portion of the devices table or performing a less efficient join. The key is to ensure TimescaleDB can leverage its partitioning (chunking) to prune data from the measurements hypertable before it attempts to join with devices.

The most common performance bottleneck here is when the join condition doesn’t allow TimescaleDB to efficiently filter the hypertable first. If your WHERE clause is on columns that aren’t part of the hypertable’s time-based partitioning or a composite partition key, TimescaleDB can’t use its chunk pruning.

Here’s how to ensure efficient joins:

1. Index the Join Column on the Regular Table: This is fundamental PostgreSQL performance tuning. If you’re joining measurements.device_id to devices.device_id, ensure devices.device_id is indexed.

  • Diagnosis:

    EXPLAIN ANALYZE
    SELECT
        m.time,
        d.device_name,
        m.temperature
    FROM
        measurements m
    JOIN
        devices d ON m.device_id = d.device_id
    WHERE
        m.time >= '2023-10-26 00:00:00' AND m.time < '2023-10-27 00:00:00';
    

    Look for sequential scans on the devices table.

  • Fix:

    CREATE INDEX idx_devices_device_id ON devices (device_id);
    
  • Why it works: This allows PostgreSQL to quickly look up matching device_ids from the devices table for each device_id found in the (potentially already filtered) measurements hypertable.

2. Filter on the Hypertable’s Time Column First: Always put your time-based filters on the hypertable in the WHERE clause. This is crucial for chunk pruning.

  • Diagnosis: Verify that the WHERE clause on m.time appears early in the EXPLAIN ANALYZE output, and that the planner estimates a small number of chunks will be scanned.

  • Fix: This is about query writing. Ensure your conditions on the hypertable’s partitioning column (usually time) are present and not overly broad.

  • Why it works: TimescaleDB identifies which chunks (partitions) of the hypertable contain data relevant to the time range specified, dramatically reducing the number of rows it needs to process from the hypertable.

3. Consider a Composite Hypertable: If you frequently join on device_id and filter by time, make device_id part of your hypertable’s partitioning scheme.

  • Diagnosis: If your time-based filtering alone isn’t enough and you still see a lot of hypertable scanning, even with indexes.

  • Fix:

    -- Drop the old hypertable first if it exists
    -- SELECT drop_chunks('measurements', '2023-10-01'::timestamptz, older_than => INTERVAL '1 year');
    -- DROP TABLE measurements;
    
    CREATE TABLE measurements (
        time TIMESTAMPTZ NOT NULL,
        device_id INT NOT NULL, -- Ensure device_id is NOT NULL for partitioning
        temperature DOUBLE PRECISION
    );
    
    -- Create a composite hypertable partitioned by time AND device_id
    SELECT create_hypertable('measurements', 'time', 'device_id', 4); -- 4 is the number of sub-partitions for device_id
    

    You’ll also need an index on devices.device_id as in point 1.

  • Why it works: This creates a two-dimensional partitioning scheme. TimescaleDB can now prune chunks not only by time but also by device_id, allowing it to isolate the data for specific devices within specific time ranges very efficiently. The join condition m.device_id = d.device_id can then be used to further filter the already highly-pruned hypertable data.

4. Use JOIN LATERAL or LEFT JOIN LATERAL if the Regular Table is Small or Filters Dynamically: If the devices table is small, or if you’re doing complex filtering on devices that can’t be pre-filtered, a lateral join can sometimes be more efficient.

  • Diagnosis: When EXPLAIN ANALYZE shows a nested loop join where the inner side (the devices table) is being scanned repeatedly for each row from measurements.

  • Fix:

    SELECT
        m.time,
        d_selected.device_name,
        m.temperature
    FROM
        measurements m
    LEFT JOIN LATERAL (
        SELECT d.device_name
        FROM devices d
        WHERE d.device_id = m.device_id
        -- Add any specific filtering for devices here if needed
    ) d_selected ON true
    WHERE
        m.time >= '2023-10-26 00:00:00' AND m.time < '2023-10-27 00:00:00';
    
  • Why it works: The LATERAL subquery is executed once for each row from the measurements table (after it’s been pruned by time). This is efficient if the subquery is very fast (e.g., using an index on devices.device_id) or if the number of rows from measurements is small after time-based pruning.

5. Materialized Views (for complex, frequently run joins): If the join is computationally expensive and the underlying data doesn’t change too frequently, a materialized view can pre-compute the joined results.

  • Diagnosis: Persistent slow query times for a specific join pattern, even after applying other optimizations.

  • Fix:

    CREATE MATERIALIZED VIEW measurements_with_devices AS
    SELECT
        m.time,
        d.device_name,
        m.temperature,
        m.device_id -- Keep device_id for potential future joins or filtering
    FROM
        measurements m
    JOIN
        devices d ON m.device_id = d.device_id;
    
    -- To refresh the data:
    REFRESH MATERIALIZED VIEW measurements_with_devices;
    
    -- Query the materialized view:
    SELECT *
    FROM measurements_with_devices
    WHERE time >= '2023-10-26 00:00:00' AND time < '2023-10-27 00:00:00';
    
  • Why it works: The join is performed once when the materialized view is created or refreshed. Subsequent queries read from the pre-joined, denormalized data, which is much faster. The trade-off is data staleness and the cost of refreshing.

The most common pitfall is forgetting that TimescaleDB’s chunk pruning only applies to the hypertable itself. Any join or filter condition that doesn’t directly involve the hypertable’s partitioning columns will be handled by standard PostgreSQL query planning, which is why indexing and proper WHERE clause usage are paramount.

The next thing you’ll likely hit is realizing that if your devices table is also extremely large and you’re filtering on device_name and time, you might need to consider if devices should also be a hypertable or if the join logic needs further refinement.

Want structured learning?

Take the full Timescaledb course →