FAISS Vector Indexing: CPU and GPU Guide (2026)

FAISS doesn’t actually store vectors; it stores representations of vectors that are optimized for fast lookup.

Let’s see FAISS in action. Imagine you have a massive collection of images, and for each image, you’ve extracted a 128-dimensional vector representing its visual features. You want to be able to quickly find images that are visually similar to a given query image. This is where FAISS shines.

import faiss
import numpy as np

# Generate some random 128-dimensional vectors
d = 128  # dimension
nb = 100000  # database size
np.random.seed(1234)  # make reproducible
xb = np.random.random((nb, d)).astype('float32')
xb[:, 0] += np.arange(nb) / 1000.  # add some structure

# Build a simple index
index = faiss.IndexFlatL2(d)  # build the index

print(index.is_trained) # False, IndexFlatL2 doesn't need training

# Add the vectors to the index
index.add(xb)

print(index.ntotal) # 100000

# Generate a query vector
xq = np.random.random((1, d)).astype('float32')
xq[:, 0] += np.arange(1) / 1000.

# Search for the nearest neighbors
k = 4  # we want to search 4 nearest neighbors
D, I = index.search(xq, k)  # sanity check

print(I)
print(D)

This code sets up a basic L2 distance index (IndexFlatL2), adds 100,000 vectors to it, and then searches for the 4 nearest neighbors to a random query vector. The output I will show the indices of the nearest vectors in xb, and D will show their corresponding L2 distances.

The core problem FAISS solves is efficient similarity search in high-dimensional spaces. Traditional methods like brute-force comparison (calculating the distance between the query vector and every single vector in the database) become computationally prohibitive as the database size grows. FAISS employs various indexing strategies to drastically reduce the number of distance calculations required.

Internally, FAISS offers a rich set of index types. IndexFlatL2 and IndexFlatIP are the simplest, performing brute-force search but still benefiting from FAISS’s optimized distance computations. For larger datasets, you’d move to more complex indexes like IndexIVFFlat (Inverted File) or IndexHNSW (Hierarchical Navigable Small Worlds).

IndexIVFFlat partitions the vector space into clusters (Voronoi cells) using a k-means step. When searching, it first identifies the closest cluster(s) to the query vector and then only searches within those clusters. The nlist parameter controls the number of clusters, and nprobe controls how many clusters are searched.

IndexHNSW builds a graph where vectors are nodes, and edges connect "close" vectors. The graph has multiple layers, allowing for efficient multi-level traversal from a high-level coarse search to a fine-grained local search. The efConstruction and efSearch parameters control the trade-off between index build time/memory usage and search speed/accuracy.

The most surprising thing about FAISS is how aggressively it can quantize and compress vector representations without sacrificing too much search accuracy, enabling massive memory savings and faster searches on standard hardware. For instance, IndexIVFPQ (Product Quantization) breaks down each vector into smaller sub-vectors, quantizes each sub-vector independently using a k-means codebook, and stores the resulting codes. This can reduce memory usage by orders of magnitude.

The key levers you control are:

Index Type: IndexFlat, IndexIVF, IndexHNSW, IndexPQ, IndexLSH, etc. Each has different performance characteristics and memory footprints.
Parameters for Index Type:
- IndexIVF: nlist (number of inverted lists/clusters), nprobe (number of lists to search).
- IndexPQ: m (number of subquantizers), nbits (bits per subquantizer, usually 8).
- IndexHNSW: M (number of neighbors per node), efConstruction, efSearch.
Distance Metric: L2 (Euclidean) or IP (Inner Product).
Data Type: FAISS primarily uses float32, but float16 and int8 are supported for quantized indexes.

The next concept you’ll likely grapple with is how to choose the right index type and parameters for your specific workload, balancing accuracy, speed, and memory constraints.

More Deep Dives in Vector Databases