TimescaleDB’s time_bucket function can group data not just by standard intervals like hours or days, but by any arbitrary duration you define.

Here’s time_bucket in action, grouping data by 3-hour intervals:

SELECT
    time_bucket('3 hours', ts) AS bucket,
    COUNT(*)
FROM
    metrics
WHERE
    ts >= '2023-10-26 00:00:00' AND ts < '2023-10-27 00:00:00'
GROUP BY
    bucket
ORDER BY
    bucket;

This query takes a table named metrics with a timestamp column ts and counts the number of records that fall into each 3-hour window within a specific day. The time_bucket function takes two arguments: the desired interval and the timestamp column. It returns the start of the interval that each timestamp falls into.

The core problem time_bucket solves is efficient time-series data aggregation. Without it, you’d be manually calculating interval boundaries, which is cumbersome and error-prone, especially with non-standard intervals. TimescaleDB’s implementation is optimized to work seamlessly with its columnar storage and indexing, making these aggregations fast even on massive datasets.

The time_bucket function is incredibly flexible. You can use any valid PostgreSQL interval literal. This means you’re not limited to simple durations; you can specify intervals like '1 day 2 hours 30 minutes' or even '1 week'. The function handles the complexities of date and time arithmetic internally, ensuring accuracy across different calendar complexities.

Internally, time_bucket effectively performs a floor operation on your timestamps, aligning them to the start of the specified interval. For example, time_bucket('3 hours', '2023-10-26 07:30:00') would return '2023-10-26 06:00:00'. This consistent alignment is what allows GROUP BY bucket to correctly group all timestamps falling within the same 3-hour window.

The time_bucket function is often used in conjunction with other TimescaleDB functions like first and last to get the first or last value within each bucket, or avg and sum for aggregations. For example, to find the average CPU usage every 6 hours:

SELECT
    time_bucket('6 hours', time) AS bucket,
    AVG(cpu_usage) AS avg_cpu_usage
FROM
    cpu_metrics
WHERE
    time >= '2023-10-20 00:00:00' AND time < '2023-10-26 00:00:00'
GROUP BY
    bucket
ORDER BY
    bucket;

When specifying intervals, TimescaleDB respects the timezone setting of your PostgreSQL session. This means that if your session is set to 'America/New_York', a time_bucket('1 day', ...) will align to midnight Eastern Time, not UTC. If you need consistent, UTC-aligned buckets regardless of session settings, it’s best practice to explicitly set the timezone for your query or within your application. For instance, you can use SET TIME ZONE 'UTC'; before your query, or use the AT TIME ZONE construct within the time_bucket call itself, though the former is generally cleaner for batch operations.

The real power comes when you combine custom intervals with other time-series specific features, like compression or continuous aggregates, to build highly performant analytical systems.

Understanding how time_bucket handles the boundaries of intervals, especially around DST changes or leap seconds, is crucial for accurate analysis. TimescaleDB leverages PostgreSQL’s robust interval arithmetic, which generally handles these edge cases correctly by default, aligning to the start of the interval as defined by the PostgreSQL interval type’s behavior.

The next logical step is to explore how to create continuous aggregates that automatically maintain these time-bucketed summaries.

Want structured learning?

Take the full Timescaledb course →