first() and last() in TimescaleDB don’t just grab the first or last row in a query result; they’re specialized, super-efficient functions for finding the minimum and maximum values within a time-ordered dataset, especially useful on indexed time-series data.

Let’s see this in action. Imagine a table sensor_readings with time (a timestamp), device_id (integer), and temperature (float). We want to know the earliest and latest temperature reading for each device.

CREATE TABLE sensor_readings (
    time TIMESTAMPTZ NOT NULL,
    device_id INT NOT NULL,
    temperature FLOAT
);

-- Create a TimescaleDB hypertable
SELECT create_hypertable('sensor_readings', 'time');

-- Add an index for faster lookups by device and time
CREATE INDEX ON sensor_readings (device_id, time DESC);
CREATE INDEX ON sensor_readings (device_id, time ASC);

-- Insert some sample data
INSERT INTO sensor_readings (time, device_id, temperature) VALUES
('2023-10-26 10:00:00 UTC', 1, 20.5),
('2023-10-26 10:05:00 UTC', 1, 21.0),
('2023-10-26 10:10:00 UTC', 1, 21.2),
('2023-10-26 10:00:00 UTC', 2, 18.0),
('2023-10-26 10:15:00 UTC', 2, 18.5),
('2023-10-26 10:20:00 UTC', 2, 18.3);

Now, to get the earliest and latest temperature for each device:

SELECT
    device_id,
    first(temperature, time) AS earliest_temp,
    last(temperature, time) AS latest_temp
FROM sensor_readings
GROUP BY device_id;

This query will return:

device_id | earliest_temp | latest_temp
----------|---------------|------------
        1 |          20.5 |        21.2
        2 |          18.0 |        18.3

The magic here is how first(value, time_column) and last(value, time_column) work. They don’t perform a full table scan and sort. Instead, they leverage the indexes on the time column and the device_id in the GROUP BY clause. For each group (each device_id), TimescaleDB efficiently finds the row with the minimum time and the row with the maximum time, then extracts the temperature from those specific rows. This is vastly more performant than ORDER BY time LIMIT 1 or ORDER BY time DESC LIMIT 1 within each group, which would require sorting all rows per group.

The core problem first() and last() solve is the need for quick boundary values in time-series data. In traditional SQL, finding the minimum or maximum value within a group usually involves sorting the entire group, which is expensive. For time-series data, where you often want the "state at the beginning of an interval" or "state at the end of an interval," this sorting is a major bottleneck. TimescaleDB’s first() and last() are designed to avoid this. They operate by finding the rows corresponding to the minimum and maximum timestamps within each group, and then retrieving the associated value. This is especially efficient when you have indexes that cover the grouping column and the time column, like the (device_id, time DESC) and (device_id, time ASC) indexes we created. The database can use these indexes to directly locate the relevant rows for each group without a full sort.

The time column in first(value, time) and last(value, time) acts as the ordering key. It tells the function which row is considered "first" and "last" within each group. This is crucial because you might have multiple columns and want to define "first" based on one specific timestamp column. The value part is what gets returned from that "first" or "last" row.

A common misconception is that first() and last() are simple aliases for MIN() and MAX() on the time column. This is incorrect. While MIN(time) and MAX(time) give you the earliest and latest timestamps, first(temperature, time) and last(temperature, time) give you the temperature from the row that has the earliest and latest time, respectively. If you simply used MIN(temperature) and MAX(temperature), you’d get the absolute minimum and maximum temperatures across all readings for that device, not necessarily the temperatures at the very first or very last recorded time.

The most surprising thing is how first() and last() can be used to reconstruct the state of values at specific points in time without needing to store redundant data or perform complex joins. For example, to find the temperature of device 1 at its very first reading, you could write SELECT first(temperature, time) FROM sensor_readings WHERE device_id = 1;. This is significantly faster than finding the minimum time first and then querying for the temperature at that exact timestamp, especially if there are multiple readings at that minimum timestamp.

The next step is often exploring how to use these functions with window functions for more complex time-based aggregations.

Want structured learning?

Take the full Timescaledb course →