Machine learning at scale 🤖

← Back to book index

Candidate Retrieval

The first and most critical step: finding relevant ads from billions of candidates in milliseconds.

The Scale Problem: Billions of Ads, Milliseconds to Decide

The fundamental challenge:

Billions of ads in the inventory
Milliseconds to make decisions
High recall needed to ensure good candidates aren't missed
Low latency required for user experience

This requires sophisticated retrieval techniques that balance speed and accuracy.

Inverted Indexes and Targeting Constraints

Inverted Indexes

Traditional information retrieval technique:

Index ads by targeting attributes (demographics, interests, keywords)
Fast lookup for matching constraints
Efficient for structured targeting

Targeting Constraints

Common constraints include:

Geographic location
Demographics (age, gender)
Interests and behaviors
Device type
Time of day

Early filtering by constraints dramatically reduces candidate set.

Embedding-Based Retrieval: Two-Tower Models

Two-Tower Architecture

User Tower: Encodes user features into embedding
Ad Tower: Encodes ad features into embedding
Similarity: Cosine similarity or dot product between embeddings

This allows semantic matching beyond exact targeting constraints.

Training

Positive pairs: User-ad pairs that resulted in clicks/conversions
Negative sampling: Random or hard negatives
Loss function: Typically contrastive loss or softmax cross-entropy

Approximate Nearest Neighbor Search at Scale

The Challenge

Exact nearest neighbor search is too slow for billions of vectors. We need approximate methods.

Techniques

Locality-Sensitive Hashing (LSH): Hash similar vectors to same buckets
Product Quantization: Compress vectors for faster search
Hierarchical Navigable Small World (HNSW): Graph-based approximate search
IVF (Inverted File Index): Cluster-based indexing

Tradeoffs

Recall vs. Latency: More accurate methods are slower
Index Size: Larger indices enable better recall but use more memory
Update Frequency: How often the index is rebuilt as new ads are added

Balancing Recall and Latency

The Tradeoff

Higher recall: More relevant candidates, better final results, but slower
Lower latency: Faster response, but may miss good candidates

Strategies

Multi-stage Retrieval: Coarse retrieval followed by fine-grained ranking
Caching: Cache frequent queries and user embeddings
Parallel Processing: Retrieve from multiple indexes in parallel
Early Termination: Stop searching once enough candidates found

The goal is to retrieve enough high-quality candidates (typically 100-1000) for downstream ranking while staying within latency budgets.

Content to be expanded...