The Fundamental Difference
SQL is optimized for exact matches:
-- Find user with id = 42
SELECT * FROM users WHERE id = 42;
-- Find all orders with status = 'shipped'
SELECT * FROM orders WHERE status = 'shipped';
Vector search is optimized for similarity:
-- Find the 10 products most similar to product 42
SELECT * FROM products
ORDER BY embedding <-> (SELECT embedding FROM products WHERE id = 42)
LIMIT 10;
The <-> operator computes L2 distance between vectors. The query asks: "find me things that are close in embedding space" — which means "semantically similar."
Why You Can't Do Semantic Search with SQL Indexes
Traditional database indexes (B-trees, hash indexes) enable exact and range lookups in O(log n). They don't help with nearest-neighbor queries in high-dimensional space.
The naive approach — compute distance from query to every row — is O(n * d) where d is the vector dimension. For 10 million 1536-dimensional vectors: ~15 billion floating-point operations per query. Too slow.
The solution: Approximate Nearest Neighbor (ANN) indexes. They sacrifice a small amount of accuracy for orders-of-magnitude speedup.
How ANN Indexes Work (Intuition)
HNSW (Hierarchical Navigable Small World)
The most widely used ANN algorithm. Think of it like a skip list for similarity:
Level 2 (sparse): [A] -------- [E] -------- [I]
| | |
Level 1 (medium): [A] --- [C] - [E] --- [G] - [I]
| | | | | | | |
Level 0 (dense): [A][B] [C][D][E][F] [G][H][I][J]
To find nearest neighbor of query Q:
- Enter at top level, greedily move to closest node
- Drop to next level at current position
- Repeat until bottom layer
- Search neighborhood at bottom layer
Time complexity: O(log n) average. Recall: ~95-99% at common settings.
IVF (Inverted File Index)
Used in FAISS:
- Cluster all vectors into K clusters during indexing
- At query time, find the M nearest cluster centroids
- Only search within those M clusters
Faster but requires more tuning (K, M) and does worse on highly non-uniform distributions.
pgvector: Vector Search in PostgreSQL
If you already use Postgres, pgvector adds vector types and indexes:
-- Install extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Add vector column to existing table
ALTER TABLE documents ADD COLUMN embedding vector(1536);
-- Create HNSW index for fast approximate search
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Insert documents with embeddings
INSERT INTO documents (content, embedding)
VALUES ('The cat sat on the mat', '[0.1, -0.3, 0.7, ...]');
-- Semantic search
SELECT id, content,
1 - (embedding <=> query_embedding) AS cosine_similarity
FROM documents
ORDER BY embedding <=> '[0.1, -0.2, ...]'::vector
LIMIT 10;
Distance operators:
<->— L2 (Euclidean) distance<=>— Cosine distance<#>— Negative inner product
pgvector with Python
import psycopg2
from pgvector.psycopg2 import register_vector
from sentence_transformers import SentenceTransformer
import numpy as np
model = SentenceTransformer("all-MiniLM-L6-v2")
# Connect
conn = psycopg2.connect("postgresql://user:pass@localhost/db")
register_vector(conn)
cursor = conn.cursor()
# Index documents
def index_documents(docs: list[dict]):
embeddings = model.encode([d["content"] for d in docs])
cursor.executemany(
"INSERT INTO documents (id, content, embedding) VALUES (%s, %s, %s)",
[(d["id"], d["content"], emb.tolist()) for d, emb in zip(docs, embeddings)]
)
conn.commit()
# Search
def semantic_search(query: str, top_k: int = 10) -> list[dict]:
query_embedding = model.encode(query).tolist()
cursor.execute("""
SELECT id, content, 1 - (embedding <=> %s::vector) AS similarity
FROM documents
ORDER BY embedding <=> %s::vector
LIMIT %s
""", (query_embedding, query_embedding, top_k))
return [
{"id": row[0], "content": row[1], "similarity": float(row[2])}
for row in cursor.fetchall()
]
Hybrid Search: Combining Keyword and Semantic
Pure semantic search misses exact keyword matches. Pure keyword search (BM25) misses paraphrases. Hybrid search combines both:
-- Hybrid search: BM25 + vector similarity
-- Reciprocal Rank Fusion (RRF) for combining scores
WITH semantic AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> $1) AS rank
FROM documents
ORDER BY embedding <=> $1
LIMIT 50
),
keyword AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY ts_rank(content_tsvector, query) DESC) AS rank
FROM documents, to_tsquery('english', $2) query
WHERE content_tsvector @@ query
LIMIT 50
)
SELECT
COALESCE(s.id, k.id) AS id,
1.0 / (60 + COALESCE(s.rank, 1000)) + 1.0 / (60 + COALESCE(k.rank, 1000)) AS rrf_score
FROM semantic s
FULL OUTER JOIN keyword k ON s.id = k.id
ORDER BY rrf_score DESC
LIMIT 10;
Dedicated Vector Databases
For large-scale vector search, dedicated databases offer more features:
Pinecone
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key="your-api-key")
# Create index
pc.create_index(
name="documents",
dimension=1536,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1")
)
index = pc.Index("documents")
# Upsert vectors
vectors = [(doc["id"], embedding.tolist(), {"content": doc["content"]})
for doc, embedding in zip(docs, embeddings)]
index.upsert(vectors=vectors)
# Query
results = index.query(
vector=query_embedding.tolist(),
top_k=10,
include_metadata=True
)
Choosing Your Vector Store
| Option | When to use |
|---|---|
| pgvector | You already use PostgreSQL; small-medium scale (<10M vectors) |
| FAISS | Local/batch use; you want control over the index |
| Pinecone | Fully managed; you don't want to operate infrastructure |
| Weaviate | You need hybrid search + GraphQL API |
| Qdrant | Open source; good filtering + payload storage |
| Milvus | Very large scale (1B+ vectors); self-hosted |
For most production applications under 50M vectors, pgvector is sufficient and keeps your stack simpler. Graduate to a dedicated store when you hit scaling or feature limitations.
Index Tuning
-- HNSW parameters
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (
m = 16, -- Max connections per layer (higher = better recall, more memory)
ef_construction = 64 -- Search width during construction (higher = better index, slower build)
);
-- At query time
SET hnsw.ef_search = 100; -- Higher = better recall, slower query
# FAISS IVF tuning
import faiss
nlist = 1000 # Number of clusters (√N is a good starting point)
quantizer = faiss.IndexFlatL2(dimension)
index = faiss.IndexIVFFlat(quantizer, dimension, nlist)
index.train(training_vectors) # IVF requires training step
index.add(all_vectors)
index.nprobe = 50 # Check 50 clusters per query (higher = better recall, slower)
Recall vs. latency tradeoff: most applications target 95%+ recall at <50ms latency. Test with your actual data distribution.
For a deeper dive into production search systems, see our guide to search in the AI era.