SQLite’s batch insert performance is often a surprise because it’s not just about how many rows you throw at it, but how you throw them.
Let’s see it in action. Imagine you have a table users and you want to insert 100,000 new users. A naive approach might look like this:
import sqlite3
conn = sqlite3.connect('mydatabase.db')
cursor = conn.cursor()
cursor.execute("CREATE TABLE IF NOT EXISTS users (id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT)")
for i in range(100000):
cursor.execute("INSERT INTO users (name) VALUES (?)", (f'User {i}',))
conn.commit()
conn.close()
This works, but it’s painfully slow. Why? Because each INSERT statement is its own transaction. SQLite has to acquire a lock, write the data, and then release the lock for every single row. It’s like asking someone to sign a document for each individual word they write.
The core problem this solves is the overhead associated with individual write operations. SQLite, by default, aims for atomicity and durability for every statement. When you’re inserting many rows, this per-statement overhead becomes the bottleneck.
Here’s how we can dramatically improve this: wrap all inserts in a single transaction.
import sqlite3
conn = sqlite3.connect('mydatabase.db')
cursor = conn.cursor()
cursor.execute("CREATE TABLE IF NOT EXISTS users (id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT)")
# Start a transaction explicitly
cursor.execute("BEGIN TRANSACTION")
for i in range(100000):
cursor.execute("INSERT INTO users (name) VALUES (?)", (f'User {i}',))
# Commit the entire batch at once
conn.commit()
conn.close()
This is much faster. By wrapping the inserts in BEGIN TRANSACTION and COMMIT, SQLite only acquires and releases locks once for the entire batch. The system writes all the changes to a temporary journal and then atomically swaps it with the main database file upon commit. This reduces disk I/O and contention significantly.
But we can go even further. SQLite offers PRAGMA statements that can tune its behavior for specific operations. For batch inserts, PRAGMA synchronous and PRAGMA journal_mode are key.
The default synchronous setting is FULL. This means SQLite waits for data to be flushed to disk before returning from a commit. It ensures maximum durability but adds overhead. Setting synchronous to NORMAL (or even OFF if you can tolerate a small risk of data loss in case of a power failure) can speed things up considerably.
The journal_mode also plays a role. The default is DELETE, where the journal is deleted after a commit. WAL (Write-Ahead Logging) mode is often recommended for concurrent readers and writers, but for a single-threaded batch insert, it might not offer a significant advantage over TRUNCATE (which reuses the journal file) or even MEMORY (though MEMORY is less durable). For pure batch insert speed, TRUNCATE or MEMORY can sometimes outperform DELETE when combined with transactions.
Let’s try a more optimized version:
import sqlite3
conn = sqlite3.connect('mydatabase.db')
cursor = conn.cursor()
cursor.execute("CREATE TABLE IF NOT EXISTS users (id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT)")
# Optimize for batch inserts
cursor.execute("PRAGMA synchronous = OFF") # Or NORMAL for slightly more safety
cursor.execute("PRAGMA journal_mode = MEMORY") # Or TRUNCATE
cursor.execute("BEGIN TRANSACTION")
for i in range(100000):
cursor.execute("INSERT INTO users (name) VALUES (?)", (f'User {i}',))
conn.commit()
# Revert PRAGMAs if necessary for subsequent operations, or close connection
# For demonstration, we'll just show the effect for the batch.
# In a real app, you'd manage this carefully.
conn.close()
When PRAGMA synchronous = OFF, SQLite doesn’t wait for the operating system to confirm that data has been written to physical disk. It trusts the OS to do its job and returns control to the application sooner. This is a significant speedup for bulk operations where the risk of immediate data loss from a crash is acceptable. PRAGMA journal_mode = MEMORY means the transaction journal is kept in RAM, which is faster to write to than disk.
A common misconception is that PRAGMA synchronous = OFF is always the fastest. However, for very large transactions, the overhead of managing the journal file itself can become a factor. Sometimes, PRAGMA journal_mode = TRUNCATE paired with PRAGMA synchronous = NORMAL offers a better balance of speed and safety than OFF and MEMORY, especially if the database file is already large. The TRUNCATE mode reuses the journal file, avoiding the cost of creating a new one for each transaction, which can be beneficial when you have many small transactions or a single large one.
The most surprising thing about optimizing SQLite batch inserts is how sensitive performance is to the interaction between transaction management and PRAGMA settings, and that the "best" settings aren’t always intuitive and can depend on the workload and hardware.
The next optimization you’ll likely explore is using executemany for even more efficient statement execution within your transaction.