TimescaleDB analytics are surprisingly easy to write because its SQL extensions are designed to make you think about data as a continuous stream, not discrete rows.
Here’s a simple example of how you might analyze temperature readings from sensors over time. Imagine you have a readings table with time (a timestamp), device_id (text), and temperature (double precision).
-- This query finds the average temperature for each device in the last 24 hours
SELECT
time_bucket('1 hour', time) AS hour_start,
device_id,
AVG(temperature) AS avg_temperature
FROM
readings
WHERE
time >= NOW() - INTERVAL '24 hours'
GROUP BY
hour_start,
device_id
ORDER BY
hour_start DESC,
device_id;
Let’s break down what’s happening here.
The time_bucket function is the star of the show. It essentially groups your time-series data into discrete intervals. In this case, we’re bucketing by '1 hour'. This means all readings within the same hour will be grouped together. The first argument is the interval, and the second is the timestamp column. It returns the start of the bucket.
The WHERE clause filters the data to only include readings from the last 24 hours. NOW() gives you the current timestamp, and INTERVAL '24 hours' subtracts that duration.
The GROUP BY clause is standard SQL, but here it’s working in tandem with time_bucket. We’re grouping by both the hour_start (the result of our time bucketing) and the device_id. This gives us the average temperature per device, per hour.
Finally, ORDER BY sorts the results, showing the most recent hours first.
This is just the tip of the iceberg. TimescaleDB provides a suite of powerful functions for time-series analysis.
Consider another common task: calculating the rate of change of a metric. This is crucial for understanding trends and detecting anomalies. You can use the time_lag function for this.
Let’s say you want to see how quickly the temperature is changing per device.
-- This query calculates the difference in temperature between consecutive readings for each device
SELECT
time,
device_id,
temperature,
temperature - LAG(temperature, 1, temperature) OVER (PARTITION BY device_id ORDER BY time) AS temperature_change
FROM
readings
WHERE
time >= NOW() - INTERVAL '1 hour'
ORDER BY
device_id, time;
Here, LAG(temperature, 1, temperature) looks at the temperature value from the previous row. The 1 indicates we want the preceding row, and the final temperature is a default value to use if there is no preceding row (which happens for the very first reading of a device). PARTITION BY device_id ensures that LAG only considers previous readings from the same device, and ORDER BY time makes sure we’re looking at the chronologically previous reading. Subtracting the previous temperature from the current one gives us the change.
You can also perform complex aggregations across time windows. For example, calculating a rolling average is very common.
-- This query calculates a 5-hour rolling average of temperature per device
SELECT
time,
device_id,
temperature,
AVG(temperature) OVER (
PARTITION BY device_id
ORDER BY time
RANGE BETWEEN INTERVAL '4 hours' PRECEDING AND CURRENT ROW
) AS rolling_avg_temperature
FROM
readings
WHERE
time >= NOW() - INTERVAL '12 hours'
ORDER BY
device_id, time;
This uses a window function with a RANGE clause. PARTITION BY device_id and ORDER BY time are again critical for defining the window. The RANGE BETWEEN INTERVAL '4 hours' PRECEDING AND CURRENT ROW defines the window for the AVG function. For each row, it considers all rows for the same device_id that fall within the 4 hours before the current row’s time, plus the current row itself. This effectively creates a 5-hour rolling average (current hour + 4 preceding hours).
The most surprising aspect of TimescaleDB’s analytical functions is how seamlessly they integrate standard SQL window functions with time-series specific optimizations. You can leverage the full power of SQL window functions like ROW_NUMBER, RANK, LEAD, LAG, and aggregations (SUM, AVG, COUNT, etc.) directly on your time-series data, and TimescaleDB’s hypertables and indexing ensure these operations remain performant even on massive datasets. This means you don’t need to learn a new, entirely separate query language for time-series analytics; you’re just extending your existing SQL knowledge.
One of the key benefits of using hypertables is that TimescaleDB automatically manages data partitioning and indexing based on your time column. This means that queries involving time ranges, like WHERE time >= NOW() - INTERVAL '24 hours', are incredibly efficient. TimescaleDB can quickly prune irrelevant chunks (partitions) of data without scanning the entire table, a process known as "chunk pruning." This is a fundamental optimization that makes time-series queries much faster compared to traditional relational databases.
The next step in advanced analytics often involves combining different time-series functions to detect patterns or anomalies, such as using gapml for anomaly detection or first and last within time_bucket to find the start and end values of an interval.