TimescaleDB interpolation is surprisingly just SQL LAG() and LEAD(), but with a crucial twist for time-series.
Let’s say you have temperature readings from a sensor every minute.
CREATE TABLE sensor_readings (
ts TIMESTAMPTZ NOT NULL PRIMARY KEY,
sensor_id INT NOT NULL,
temperature DECIMAL
);
INSERT INTO sensor_readings (ts, sensor_id, temperature) VALUES
('2023-10-27 10:00:00+00', 1, 20.5),
('2023-10-27 10:02:00+00', 1, 21.0), -- Sensor missed 10:01
('2023-10-27 10:03:00+00', 1, 21.5),
('2023-10-27 10:05:00+00', 1, 22.0); -- Sensor missed 10:04
You want to fill that missing 10:01 reading. A naive LAG() might give you 20.5, but that’s not right if the sensor should have been reporting.
TimescaleDB’s interpolate() function, when used with time_bucket, helps here. It doesn’t just grab the previous value; it understands you want to project forward or backward within time buckets.
Here’s how you’d get a linearly interpolated value for every minute:
SELECT
ts,
sensor_id,
temperature,
interpolate(
temperature,
'linear',
ts,
-- The previous non-null value and its timestamp
lag(temperature) OVER (PARTITION BY sensor_id ORDER BY ts),
lag(ts) OVER (PARTITION BY sensor_id ORDER BY ts),
-- The next non-null value and its timestamp
lead(temperature) OVER (PARTITION BY sensor_id ORDER BY ts),
lead(ts) OVER (PARTITION BY sensor_id ORDER BY ts)
) AS interpolated_temperature
FROM sensor_readings
WHERE sensor_id = 1
ORDER BY ts;
This query will output:
ts | sensor_id | temperature | interpolated_temperature
--------------------------------------------------------------------------
2023-10-27 10:00:00+00 | 1 | 20.5 | 20.5
2023-10-27 10:01:00+00 | 1 | | 20.75 -- Interpolated
2023-10-27 10:02:00+00 | 1 | 21.0 | 21.0
2023-10-27 10:03:00+00 | 1 | 21.5 | 21.5
2023-10-27 10:04:00+00 | 1 | | 21.75 -- Interpolated
2023-10-27 10:05:00+00 | 1 | 22.0 | 22.0
The interpolate() function takes the current value (which might be NULL), the interpolation method ('linear' is common for time-series), the current timestamp, and then the previous valid value, previous timestamp, next valid value, and next timestamp. The LAG() and LEAD() window functions provide these previous/next values and timestamps, partitioned by sensor_id and ordered by ts.
When interpolate() encounters a NULL temperature at 10:01, it sees the previous point at 10:00 with 20.5 and the next point at 10:02 with 21.0. Using 'linear' interpolation, it calculates the value at 10:01 as halfway between 20.5 and 21.0, which is 20.75.
The real power comes when you combine this with time_bucket. Imagine you’re aggregating data into 5-minute buckets and want to ensure continuity.
SELECT
time_bucket('5 minute', ts) as bucket,
sensor_id,
interpolate(
avg(temperature), -- The aggregated value for the bucket
'linear',
time_bucket('5 minute', ts), -- The timestamp for the bucket
-- Previous non-null aggregated value and its bucket timestamp
lag(avg(temperature)) OVER (PARTITION BY sensor_id ORDER BY time_bucket('5 minute', ts)),
lag(time_bucket('5 minute', ts)) OVER (PARTITION BY sensor_id ORDER BY time_bucket('5 minute', ts)),
-- Next non-null aggregated value and its bucket timestamp
lead(avg(temperature)) OVER (PARTITION BY sensor_id ORDER BY time_bucket('5 minute', ts)),
lead(time_bucket('5 minute', ts)) OVER (PARTITION BY sensor_id ORDER BY time_bucket('5 minute', ts))
) AS interpolated_avg_temperature
FROM sensor_readings
WHERE sensor_id = 1
GROUP BY bucket, sensor_id
ORDER BY bucket;
This would fill gaps where a whole time_bucket might have no readings. The interpolate() function here operates on the aggregated values (avg(temperature)) within each bucket.
The key insight is that interpolate() isn’t a magical "fill the gap" button; it’s a mathematical function that requires you to provide the context (the surrounding known points) via window functions. TimescaleDB’s time-series nature makes these surrounding points naturally available and relevant.
The stepwise interpolation method is often overlooked but incredibly useful. Unlike linear, which draws a straight line, stepwise maintains the last known value until the next known value appears. This is often more physically realistic for systems where a state doesn’t change until an event triggers it.
What most users don’t realize is that interpolate() can handle multiple dimensions of data. If you were interpolating pressure and temperature, you’d pass those as separate arguments to interpolate(), and it would interpolate each independently based on its own surrounding non-NULL values.
The next step is often thinking about how to permanently store these interpolated values or how to use them in continuous aggregates.