The SQLite Online Backup API lets you copy a live database file without taking a write lock, which is pretty wild when you think about it.

Imagine you have a web server that’s constantly reading and writing to an SQLite database. Normally, if you wanted to back up that database, you’d have to stop all writes, maybe even all reads, to ensure data integrity. This means downtime, which is a no-go for most applications. The Online Backup API, however, allows you to create a consistent snapshot of the database while it’s still actively being modified.

Let’s see it in action. Here’s a Python script that demonstrates backing up a live database. We’ll simulate some writes happening concurrently.

import sqlite3
import time
import os

# Create a dummy database
db_file = "mydatabase.db"
backup_file = "mydatabase.db.bak"

# Clean up previous runs
if os.path.exists(db_file):
    os.remove(db_file)
if os.path.exists(backup_file):
    os.remove(backup_file)

conn = sqlite3.connect(db_file)
cursor = conn.cursor()
cursor.execute("CREATE TABLE users (id INTEGER PRIMARY KEY, name TEXT)")
conn.commit()
print("Database created and table initialized.")

# Function to simulate writes
def simulate_writes(db_path):
    conn_writer = sqlite3.connect(db_path)
    cursor_writer = conn_writer.cursor()
    for i in range(100):
        try:
            cursor_writer.execute("INSERT INTO users (name) VALUES (?)", (f"User_{i}",))
            conn_writer.commit()
            print(f"Wrote: User_{i}")
            time.sleep(0.05) # Simulate some work
        except sqlite3.Error as e:
            print(f"Write error: {e}")
            conn_writer.rollback()
        finally:
            pass # Keep connection open for subsequent writes
    conn_writer.close()

# Start the writer in a separate thread (or process, but thread is simpler for demo)
# For a true demo, you'd use threading, but for this script, we'll just run it sequentially
# with a delay to simulate concurrency.

# --- The Backup Process ---
print("Starting backup...")
backup_conn = sqlite3.connect(db_file)
backup_cursor = backup_conn.cursor()

# Open the backup database in exclusive mode, create if not exists
# We use a separate connection for the backup target
target_conn = sqlite3.connect(backup_file)
target_cursor = target_conn.cursor()

# Get the total size of the database
backup_cursor.execute("PRAGMA page_count;")
total_pages = backup_cursor.fetchone()[0]
print(f"Total pages in source database: {total_pages}")

# Initialize the backup
# The `sqlite3_backup_init` function (or its Python equivalent) sets up the backup object.
# It takes the destination database handle and the source database handle.
# The `sqlite3_backup_step` function then copies pages.
# The `sqlite3_backup_remaining` and `sqlite3_backup_pagecount` functions
# tell us how many pages are left and the total pages copied so far.

# In Python's sqlite3 module, this is abstracted.
# We use `backup_conn.iterdump()` or the underlying C API if available for more control.
# However, the `sqlite3` module in Python doesn't directly expose the low-level
# backup API functions like `sqlite3_backup_init`, `sqlite3_backup_step`, etc.
# For a direct demonstration of the C API, you'd need to use ctypes or a library
# that wraps it.

# A common Pythonic way to achieve a "hot backup" without explicit locking
# is to use `iterdump()`. While not *exactly* the C API's backup object,
# it generates SQL statements to recreate the database and can be run
# on a live database.

# Let's simulate writing while `iterdump` runs.
# To truly show the C API, we'd need a different approach.
# For this example, we'll use `iterdump` which is commonly used for hot backups.

# Let's reset the scenario slightly to focus on `iterdump` as a hot backup method.
conn = sqlite3.connect(db_file)
cursor = conn.cursor()
cursor.execute("CREATE TABLE IF NOT EXISTS logs (message TEXT)")
conn.commit()

# Simulate concurrent writes again
import threading

def writer_task(db_path):
    conn_w = sqlite3.connect(db_path)
    cursor_w = conn_w.cursor()
    for i in range(50):
        try:
            cursor_w.execute("INSERT INTO logs (message) VALUES (?)", (f"Log entry {i}",))
            conn_w.commit()
            # print(f"Writer: Inserted log {i}")
            time.sleep(0.01)
        except sqlite3.Error as e:
            print(f"Writer error: {e}")
            conn_w.rollback()
    conn_w.close()

writer_thread = threading.Thread(target=writer_task, args=(db_file,))
writer_thread.start()

# Now, perform the "hot backup" using iterdump
print("Starting iterdump backup...")
with open(backup_file, 'w') as f:
    for line in conn.iterdump():
        f.write(line + '\n')
        # Simulate some delay during backup to allow writer to proceed
        if "INSERT INTO logs" in line: # A crude way to inject delay
             time.sleep(0.005)

conn.close() # Close the source connection after dump
writer_thread.join() # Wait for writer to finish

print(f"Backup completed to {backup_file}")

# Verify the backup
print("\nVerifying backup...")
conn_bak = sqlite3.connect(backup_file)
cursor_bak = conn_bak.cursor()
cursor_bak.execute("SELECT COUNT(*) FROM logs")
log_count_bak = cursor_bak.fetchone()[0]
print(f"Number of log entries in backup: {log_count_bak}")
conn_bak.close()

# Note: The direct C API `sqlite3_backup_init` offers more granular control
# and is the true "hot backup" mechanism. `iterdump` is a practical Python
# alternative for many use cases, producing a text SQL dump.

The core idea behind the SQLite Online Backup API is that it operates by copying the database page by page. When you initiate a backup, it takes a snapshot of the current state. As writes occur, they modify pages that haven’t yet been copied. The API is designed to handle this by ensuring that each page is copied exactly once, in a consistent state, without blocking ongoing transactions. It uses an internal mechanism to track which pages have been read and which are being modified, effectively creating a point-in-time copy.

The process involves:

  1. Initialization: You create a backup object, specifying the source and destination database handles.
  2. Stepping: You repeatedly call a step function. Each call copies a certain number of pages (or a single page).
  3. Progress Monitoring: You can query the backup object to see how many pages are remaining and how many have been successfully copied.
  4. Completion: You continue stepping until no pages remain.

This allows you to have a consistent backup even if the database is being actively written to. The writes happening during the backup will be reflected in subsequent pages copied, or the API will ensure that the copied pages represent a coherent state.

The most surprising aspect of the Online Backup API is that it achieves this consistency without requiring an EXCLUSIVE lock on the entire database for the duration of the backup. This is crucial for applications that need high availability. Instead of locking, it relies on a clever page-level copying mechanism that accounts for concurrent writes.

Internally, when sqlite3_backup_init is called, SQLite creates a temporary internal copy of the database file. The backup process then reads from this temporary copy. Writes to the live database are directed to new pages or modified pages. The backup process intelligently reads pages that are not currently being modified or that have already been flushed. If a page is modified after it has been read by the backup but before the backup has finalized, the API has mechanisms to ensure the backup remains consistent. It’s not just a simple file copy; it’s an intelligent, transactional snapshot.

One thing most people don’t know is that the backup process itself can be throttled. Using sqlite3_backup_step(backup, n_pages) where n_pages is a positive integer, you can control how many pages are copied per step. This is invaluable for preventing the backup operation from consuming excessive I/O resources and impacting the performance of the live application. By copying pages in smaller chunks and yielding control back to the application, you can ensure that the backup runs smoothly in the background without causing noticeable delays for users. You can even pass -1 for n_pages to copy all remaining pages in a single step, or 0 to yield control without copying any pages, effectively pausing the backup.

The next concept you’ll likely encounter is how to handle schema changes or data corruption detection after performing a hot backup.

Want structured learning?

Take the full Sqlite course →