From SQL to Vector Search: A Developer's Guide

The Fundamental Difference

SQL is optimized for exact matches:

-- Find user with id = 42
SELECT * FROM users WHERE id = 42;

-- Find all orders with status = 'shipped'
SELECT * FROM orders WHERE status = 'shipped';

Vector search is optimized for similarity:

-- Find the 10 products most similar to product 42
SELECT * FROM products
ORDER BY embedding <-> (SELECT embedding FROM products WHERE id = 42)
LIMIT 10;

The <-> operator computes L2 distance between vectors. The query asks: "find me things that are close in embedding space" — which means "semantically similar."

Why You Can't Do Semantic Search with SQL Indexes

Traditional database indexes (B-trees, hash indexes) enable exact and range lookups in O(log n). They don't help with nearest-neighbor queries in high-dimensional space.

The naive approach — compute distance from query to every row — is O(n * d) where d is the vector dimension. For 10 million 1536-dimensional vectors: ~15 billion floating-point operations per query. Too slow.

The solution: Approximate Nearest Neighbor (ANN) indexes. They sacrifice a small amount of accuracy for orders-of-magnitude speedup.

How ANN Indexes Work (Intuition)

HNSW (Hierarchical Navigable Small World)

The most widely used ANN algorithm. Think of it like a skip list for similarity:

Level 2 (sparse):  [A] -------- [E] -------- [I]
                    |            |            |
Level 1 (medium):  [A] --- [C] - [E] --- [G] - [I]
                    |  |    |    |  |    |    |  |
Level 0 (dense):   [A][B] [C][D][E][F] [G][H][I][J]

To find nearest neighbor of query Q:

Enter at top level, greedily move to closest node
Drop to next level at current position
Repeat until bottom layer
Search neighborhood at bottom layer

Time complexity: O(log n) average. Recall: ~95-99% at common settings.

IVF (Inverted File Index)

Used in FAISS:

Cluster all vectors into K clusters during indexing
At query time, find the M nearest cluster centroids
Only search within those M clusters

Faster but requires more tuning (K, M) and does worse on highly non-uniform distributions.

pgvector: Vector Search in PostgreSQL

If you already use Postgres, pgvector adds vector types and indexes:

-- Install extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Add vector column to existing table
ALTER TABLE documents ADD COLUMN embedding vector(1536);

-- Create HNSW index for fast approximate search
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Insert documents with embeddings
INSERT INTO documents (content, embedding)
VALUES ('The cat sat on the mat', '[0.1, -0.3, 0.7, ...]');

-- Semantic search
SELECT id, content,
       1 - (embedding <=> query_embedding) AS cosine_similarity
FROM documents
ORDER BY embedding <=> '[0.1, -0.2, ...]'::vector
LIMIT 10;

Distance operators:

<-> — L2 (Euclidean) distance
<=> — Cosine distance
<#> — Negative inner product

pgvector with Python

import psycopg2
from pgvector.psycopg2 import register_vector
from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")

# Connect
conn = psycopg2.connect("postgresql://user:pass@localhost/db")
register_vector(conn)
cursor = conn.cursor()

# Index documents
def index_documents(docs: list[dict]):
    embeddings = model.encode([d["content"] for d in docs])

    cursor.executemany(
        "INSERT INTO documents (id, content, embedding) VALUES (%s, %s, %s)",
        [(d["id"], d["content"], emb.tolist()) for d, emb in zip(docs, embeddings)]
    )
    conn.commit()

# Search
def semantic_search(query: str, top_k: int = 10) -> list[dict]:
    query_embedding = model.encode(query).tolist()

    cursor.execute("""
        SELECT id, content, 1 - (embedding <=> %s::vector) AS similarity
        FROM documents
        ORDER BY embedding <=> %s::vector
        LIMIT %s
    """, (query_embedding, query_embedding, top_k))

    return [
        {"id": row[0], "content": row[1], "similarity": float(row[2])}
        for row in cursor.fetchall()
    ]

Hybrid Search: Combining Keyword and Semantic

Pure semantic search misses exact keyword matches. Pure keyword search (BM25) misses paraphrases. Hybrid search combines both:

-- Hybrid search: BM25 + vector similarity
-- Reciprocal Rank Fusion (RRF) for combining scores
WITH semantic AS (
    SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> $1) AS rank
    FROM documents
    ORDER BY embedding <=> $1
    LIMIT 50
),
keyword AS (
    SELECT id, ROW_NUMBER() OVER (ORDER BY ts_rank(content_tsvector, query) DESC) AS rank
    FROM documents, to_tsquery('english', $2) query
    WHERE content_tsvector @@ query
    LIMIT 50
)
SELECT
    COALESCE(s.id, k.id) AS id,
    1.0 / (60 + COALESCE(s.rank, 1000)) + 1.0 / (60 + COALESCE(k.rank, 1000)) AS rrf_score
FROM semantic s
FULL OUTER JOIN keyword k ON s.id = k.id
ORDER BY rrf_score DESC
LIMIT 10;

Dedicated Vector Databases

For large-scale vector search, dedicated databases offer more features:

Pinecone

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="your-api-key")

# Create index
pc.create_index(
    name="documents",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)

index = pc.Index("documents")

# Upsert vectors
vectors = [(doc["id"], embedding.tolist(), {"content": doc["content"]})
           for doc, embedding in zip(docs, embeddings)]
index.upsert(vectors=vectors)

# Query
results = index.query(
    vector=query_embedding.tolist(),
    top_k=10,
    include_metadata=True
)

Choosing Your Vector Store

Option	When to use
pgvector	You already use PostgreSQL; small-medium scale (<10M vectors)
FAISS	Local/batch use; you want control over the index
Pinecone	Fully managed; you don't want to operate infrastructure
Weaviate	You need hybrid search + GraphQL API
Qdrant	Open source; good filtering + payload storage
Milvus	Very large scale (1B+ vectors); self-hosted

For most production applications under 50M vectors, pgvector is sufficient and keeps your stack simpler. Graduate to a dedicated store when you hit scaling or feature limitations.

Index Tuning

-- HNSW parameters
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (
    m = 16,              -- Max connections per layer (higher = better recall, more memory)
    ef_construction = 64 -- Search width during construction (higher = better index, slower build)
);

-- At query time
SET hnsw.ef_search = 100;  -- Higher = better recall, slower query

# FAISS IVF tuning
import faiss

nlist = 1000  # Number of clusters (√N is a good starting point)
quantizer = faiss.IndexFlatL2(dimension)
index = faiss.IndexIVFFlat(quantizer, dimension, nlist)

index.train(training_vectors)  # IVF requires training step
index.add(all_vectors)

index.nprobe = 50  # Check 50 clusters per query (higher = better recall, slower)

Recall vs. latency tradeoff: most applications target 95%+ recall at <50ms latency. Test with your actual data distribution.

For a deeper dive into production search systems, see our guide to search in the AI era.