A TimescaleDB hypertable is not just a table that automatically partitions data; it’s a smart, time-series-optimized data structure that leverages both time and space partitioning for incredible performance gains.
Let’s see this in action. Imagine we’re storing sensor readings.
CREATE TABLE sensor_readings (
time TIMESTAMPTZ NOT NULL,
device_id INT NOT NULL,
temperature DOUBLE PRECISION,
humidity DOUBLE PRECISION
);
SELECT create_hypertable('sensor_readings', 'time', chunk_time_interval => 86400000); -- 1 day interval
Here, time is our primary time dimension. chunk_time_interval => 86400000 means TimescaleDB will automatically create new "chunks" (partitions) for every 24 hours of data. This is time partitioning.
But what if we have millions of devices? Querying all temperature readings for a specific hour could still be slow if we have to scan data from every device. This is where space partitioning comes in. We can add a secondary partitioning dimension.
ALTER TABLE sensor_readings ADD PRIMARY KEY (time, device_id);
SELECT add_dimension('sensor_readings', 'device_id', number_partitions => 16);
Now, TimescaleDB doesn’t just partition by time; it also partitions by device_id. It creates a grid of chunks. A chunk is now defined by both a time range and a range of device_id values. For example, one chunk might contain data for device_ids 0-15 within a specific 24-hour period. Another chunk would hold data for device_ids 16-31 in the same 24-hour period, and so on.
This dual partitioning is the core of hypertable performance. When you query data, TimescaleDB uses "chunk exclusion" to only scan the relevant chunks. If you ask for temperature readings for device_id 10 between 9 AM and 10 AM yesterday, TimescaleDB knows precisely which chunk(s) contain that data and ignores all others. This dramatically reduces the amount of data scanned, leading to sub-second query times even on terabytes of data.
The number_partitions in add_dimension determines how many distinct values of device_id are distributed across the partitions. A higher number means more partitions for device_id, potentially leading to finer-grained access.
The real magic is how TimescaleDB handles these chunks. They are regular PostgreSQL tables under the hood. This means you can VACUUM, ANALYZE, and REINDEX them individually. You can even drop old chunks directly, which is an incredibly fast DROP TABLE operation, unlike deleting rows from a massive single table.
Consider a query for recent data: SELECT * FROM sensor_readings WHERE time > NOW() - INTERVAL '1 hour' AND device_id = 5;. TimescaleDB will identify the time range for the last hour and then, using the device_id dimension, locate the specific chunk(s) that contain device_id 5 within that time frame. It won’t even look at chunks for other device_ids.
When you create a hypertable, TimescaleDB automatically selects the primary time dimension. However, for optimal performance on high-cardinality dimensions like device_id or user_id, explicitly adding them as space dimensions is crucial. Without it, you’d still benefit from time partitioning, but queries filtering on those high-cardinality columns would have to scan more chunks.
The add_dimension command creates a new set of partitions for existing data and also dictates how new data will be partitioned. The system automatically rebalances data into the new chunks based on the number_partitions you specify.
The most surprising aspect for many is that you can add or drop dimensions after the hypertable has been created and populated. TimescaleDB handles the rebalancing of existing data into the new partitioning scheme with minimal disruption, a feat that would be incredibly complex and time-consuming with traditional relational databases.
The next step is understanding how to optimize these chunks further using compression and data retention policies.