Supabase Vector Search lets you treat your text embeddings like any other structured data, making similarity search as straightforward as a SQL SELECT.
Let’s see it in action. Imagine you have a collection of product descriptions, and you want to find products similar to a given one.
First, you need to generate embeddings for your product descriptions. This usually involves an external service or a local model. For this example, let’s assume you have a table named products with a description column and you’ve generated an embedding column of type vector.
-- Create the products table
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
description TEXT NOT NULL,
embedding vector(1536) -- Assuming OpenAI's text-embedding-ada-002
);
-- Insert some sample data
INSERT INTO products (name, description, embedding) VALUES
('Awesome T-Shirt', 'A comfortable and stylish t-shirt made from 100% cotton.', (SELECT embedding FROM ai_embedding_function('A comfortable and stylish t-shirt made from 100% cotton.') LIMIT 1)),
('Cool Hoodie', 'Stay warm and fashionable with this fleece-lined hoodie.', (SELECT embedding FROM ai_embedding_function('Stay warm and fashionable with this fleece-lined hoodie.') LIMIT 1)),
('Running Shoes', 'Lightweight and breathable shoes designed for optimal running performance.', (SELECT embedding FROM ai_embedding_function('Lightweight and breathable shoes designed for optimal running performance.') LIMIT 1)),
('Cozy Blanket', 'A soft and warm blanket perfect for a relaxing evening.', (SELECT embedding FROM ai_embedding_function('A soft and warm blanket perfect for a relaxing evening.') LIMIT 1));
Now, to find products similar to "A comfortable and stylish t-shirt made from 100% cotton.", you’d generate an embedding for this query and then perform a similarity search. Supabase uses the pgvector extension, which provides the vector data type and similarity operators. The most common operator is <=> (cosine distance). A smaller cosine distance means higher similarity.
-- Assume 'ai_embedding_function' is a hypothetical function that generates embeddings
-- In a real scenario, you'd call your embedding model here.
-- Generate the embedding for the query
WITH query_embedding AS (
SELECT embedding FROM ai_embedding_function('A comfortable and stylish t-shirt made from 100% cotton.')
)
-- Perform the similarity search
SELECT
p.id,
p.name,
p.description,
(p.embedding <=> q.embedding) AS similarity
FROM
products p,
query_embedding q
ORDER BY
similarity ASC
LIMIT 5;
This query selects products, calculates the cosine distance between their embeddings and the query embedding, orders them by this distance (smallest first), and returns the top 5.
The core problem Supabase Vector Search solves is enabling efficient similarity search over high-dimensional vector data directly within your PostgreSQL database. Traditional databases are optimized for exact matches or range queries on structured data, not for finding "close" items in a vast, abstract vector space. By integrating pgvector, Supabase allows you to index and query these vectors using specialized algorithms, treating them as first-class citizens alongside your relational data. This means you can combine semantic search with your existing relational queries, filtering by price, category, or any other attribute, all in a single database call.
The vector data type in pgvector stores vectors as arrays of floating-point numbers. The operators like <=> (cosine distance), <-> (Euclidean distance), and <#> (dot product) are implemented to perform these calculations efficiently. For large datasets, pgvector supports indexing, most notably using Hierarchical Navigable Small Worlds (HNSW). An HNSW index allows the database to quickly find approximate nearest neighbors without scanning the entire table, drastically improving query performance from O(N) to roughly O(log N). When you create an index like CREATE INDEX ON products USING ivfflat (embedding vector_cosine_ops), you’re telling PostgreSQL to build a specialized index that can rapidly prune the search space for cosine similarity queries.
Here’s a crucial detail about how similarity is often perceived versus calculated: while cosine distance is a common metric for semantic similarity, it only measures the angle between vectors, not their magnitude. This means two vectors pointing in the same direction but with different lengths will have a cosine distance of 0 (perfect similarity). If the magnitude of your embeddings is meaningful (e.g., it represents confidence or importance), you might need to consider normalizing your vectors before storing them or use a different distance metric, or even a hybrid approach that combines vector similarity with other data attributes.
The next step is often optimizing these queries for very large datasets using advanced indexing techniques or exploring different embedding models to improve the relevance of your search results.