TimescaleDB is actually more efficient when you have a lot of open chunks during an insert, up to a point.
Let’s see it in action. Imagine you have a hypertable measurements with a time column and a device_id column. We’re inserting data from many devices concurrently.
CREATE TABLE measurements (
time TIMESTAMPTZ NOT NULL,
device_id INT NOT NULL,
value DOUBLE PRECISION
);
-- Create a hypertable
SELECT create_hypertable('measurements', 'time', chunk_time_interval => INTERVAL '1 day');
-- Add an index for faster lookups by device and time
CREATE INDEX ON measurements (device_id, time DESC);
Now, let’s simulate concurrent inserts from different devices.
-- Simulate concurrent inserts from 1000 devices
DO $$
DECLARE
i INT;
BEGIN
FOR i IN 1..1000 LOOP
INSERT INTO measurements (time, device_id, value)
SELECT generate_series(
NOW() - interval '1 hour',
NOW(),
'1 minute'
),
i,
random() * 100;
END LOOP;
END $$;
When this insert happens, TimescaleDB needs to figure out which chunk(s) each incoming row belongs to. For a single row, it looks at the time value and the partitioning scheme. If you have many distinct device_id values and are inserting data that spans a wide time range, you can end up with many "open" chunks that are candidates for insertion.
The max_open_chunks_per_insert parameter controls how many chunks TimescaleDB will keep open and ready for insertion during a single INSERT statement. The default is 5.
Why the default is often too low for high-concurrency writes:
When you have many concurrent inserts, especially from different device_ids (if device_id is part of your partitioning scheme or you have policies that create chunks based on it), each insert might need to write to a different chunk. If the number of required chunks exceeds max_open_chunks_per_insert, TimescaleDB has to perform extra work. It has to repeatedly open chunks, write to them, and then potentially close them, only to reopen them later. This repeated opening and closing of chunks incurs overhead.
By increasing max_open_chunks_per_insert, you tell TimescaleDB to keep more chunks open simultaneously. This reduces the overhead of opening and closing chunks for each insert operation, leading to significantly better write performance, especially in scenarios with high cardinality on your partitioning column (like device_id) and concurrent inserts.
Diagnosing the need to tune:
If you’re experiencing write performance bottlenecks and your logs show messages related to "too many chunks opened" or "chunk opening overhead," it’s a strong indicator. You can also monitor the number of chunks your hypertable has.
SELECT count(*) FROM timescaledb_catalog.chunk WHERE hypertable_name = 'measurements';
If you have thousands of chunks and are inserting data from many sources concurrently, you’re likely hitting this.
Tuning max_open_chunks_per_insert:
The optimal value depends heavily on your workload. A good starting point for high-concurrency writes is often between 50 and 200.
To change it, you can set it at the session level or globally. For testing, session level is good:
-- Set for the current session
SET timescaledb.max_open_chunks_per_insert = 100;
-- Now run your inserts
DO $$
-- ... your insert logic ...
$$;
To set it globally (requires superuser privileges and a restart of the PostgreSQL server or a pg_reload_conf() call):
Edit your postgresql.conf file:
timescaledb.max_open_chunks_per_insert = 100
Or, if you have PostgreSQL 9.4+ and ALTER SYSTEM is enabled:
ALTER SYSTEM SET timescaledb.max_open_chunks_per_insert = 100;
SELECT pg_reload_conf(); -- Or restart PostgreSQL
Why this works mechanically:
TimescaleDB uses a cache to keep track of open chunks. When an insert needs to write to a chunk, it first checks this cache. If the chunk is there, it’s a fast hit. If not, it has to open the chunk (which involves looking up its OID, acquiring locks, etc.) and add it to the cache. By increasing max_open_chunks_per_insert, you increase the size of this cache, making it more likely that the chunks needed by your concurrent inserts are already open and readily available, reducing the latency associated with opening new chunks.
Important considerations:
- Memory: Increasing this value does consume more memory per connection because each open chunk requires some state to be held in memory. Don’t set it excessively high without considering your server’s RAM.
- Partitioning Granularity: This setting is most impactful when your data is spread across many chunks due to high cardinality in your partitioning columns (e.g.,
device_id,user_id). If your data mostly lands in a few chunks, this setting won’t help much. - Read vs. Write: This parameter is primarily for write performance. It has minimal impact on read performance.
The next thing you’ll likely encounter when tuning for high write throughput is optimizing the timescaledb.chunk_insert_cache_size.