The most surprising thing about SQLite indexes is that they don’t just speed up lookups; they actively prevent certain types of expensive operations from even being considered by the query planner.

Let’s say you have a table users with id (INTEGER PRIMARY KEY), username (TEXT), and email (TEXT).

CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    username TEXT,
    email TEXT
);

INSERT INTO users (username, email) VALUES
('alice', 'alice@example.com'),
('bob', 'bob@example.com'),
('charlie', 'charlie@example.com'),
('alice', 'alice.new@example.com');

If you run a simple query like:

SELECT * FROM users WHERE username = 'alice';

Without an index on username, SQLite has to do a full table scan. It reads every single row, checks if the username column matches 'alice', and if so, returns the row. For a small table, this is fine. For millions of rows, it’s a performance killer.

Now, let’s add an index:

CREATE INDEX idx_users_username ON users (username);

When you run the same query again:

SELECT * FROM users WHERE username = 'alice';

SQLite’s query planner sees idx_users_username. It knows that this index is sorted by username. Instead of scanning the whole table, it can perform a binary search on the index to find the entries where username is 'alice'. This is orders of magnitude faster. It might find the index entry for 'alice' in milliseconds, then use the rowid stored in the index to directly fetch the corresponding rows from the users table.

The key here is that the index contains the data needed for the WHERE clause. This is called a covering index. If your SELECT list only contains columns that are also present in the index definition, SQLite can satisfy the entire query just by reading the index, without ever touching the main table data.

Consider this query:

SELECT username, email FROM users WHERE username = 'bob';

If you have the idx_users_username index, SQLite can find 'bob' in the index. However, the index only stores the username column (and the rowid). To get the email column, it still has to go to the main table.

But if you define a covering index like this:

CREATE INDEX idx_users_username_email ON users (username, email);

Now, when you run:

SELECT username, email FROM users WHERE username = 'bob';

SQLite can find 'bob' in idx_users_username_email. Crucially, the email column is also stored within this index. SQLite can read the username and email directly from the index entry for 'bob' without ever needing to access the users table data at all. This is the ultimate optimization.

The query planner’s decision-making is surprisingly nuanced. It doesn’t just look for an index; it looks for the best index for the specific query. This involves considering:

  1. Equality matches: WHERE column = value is the most efficient for indexes.
  2. Range queries: WHERE column > value or WHERE column BETWEEN x AND y can also use indexes, but might not be as fast as equality.
  3. Order of columns: For multi-column indexes, the order matters. An index on (a, b) can efficiently serve queries filtered by a or by a AND b, but not typically by b alone.
  4. Covering capability: Can the index provide all the data requested in the SELECT list?

Let’s look at a more complex scenario. Suppose you have a table orders with customer_id, order_date, and amount.

CREATE TABLE orders (
    order_id INTEGER PRIMARY KEY,
    customer_id INTEGER,
    order_date DATE,
    amount REAL
);

-- Populate with some data...

If you frequently run queries like:

SELECT order_id, order_date FROM orders WHERE customer_id = 123 AND order_date BETWEEN '2023-01-01' AND '2023-12-31';

A single index on customer_id would help, but it would still require scanning all orders for customer_id = 123 and then filtering by order_date.

The optimal index for this would be a composite index that matches the WHERE clause as closely as possible, and ideally covers the SELECT list:

CREATE INDEX idx_orders_cust_date_id ON orders (customer_id, order_date, order_id);

With this index, SQLite can:

  1. Quickly find all entries for customer_id = 123.
  2. Within those entries, efficiently find the ones where order_date falls within the specified range because order_date is the second column in the index.
  3. Since order_id is the third column in the index, SQLite can retrieve both order_date and order_id directly from the index itself, without touching the orders table. This is a fully covering index for this specific query.

The most common mistake is to create single-column indexes for queries that involve multiple columns in their WHERE clauses. While an index on customer_id will be used, it won’t be as effective as an index on (customer_id, order_date). SQLite’s query planner is smart, but it can only work with the tools you give it. If you ask it to filter by two columns but only provide an index on one, it has to do extra work.

The "magic" of covering indexes is that they reduce the total number of I/O operations dramatically. Instead of reading index pages to find rowids, then reading table pages for each rowid, it reads only index pages. This is especially impactful on large datasets or on systems with slow disk I/O.

One thing people often overlook is that SQLite’s indexer can also be used to satisfy ORDER BY clauses. If you have CREATE INDEX idx_users_username ON users (username); and run SELECT username FROM users ORDER BY username;, SQLite can just read the index in its natural order, avoiding a separate sorting step.

The next logical step after mastering covering indexes is understanding how to use EXPLAIN QUERY PLAN to analyze your query performance and identify where indexes are being used (or not used).

Want structured learning?

Take the full Sqlite course →