Candidate Retrieval
The first and most critical step: finding relevant ads from billions of candidates in milliseconds.
The Scale Problem: Billions of Ads, Milliseconds to Decide
The fundamental challenge:
- Billions of ads in the inventory
- Milliseconds to make decisions
- High recall needed to ensure good candidates aren't missed
- Low latency required for user experience
This requires sophisticated retrieval techniques that balance speed and accuracy.
Inverted Indexes and Targeting Constraints
Inverted Indexes
Traditional information retrieval technique:
- Index ads by targeting attributes (demographics, interests, keywords)
- Fast lookup for matching constraints
- Efficient for structured targeting
Targeting Constraints
Common constraints include:
- Geographic location
- Demographics (age, gender)
- Interests and behaviors
- Device type
- Time of day
Early filtering by constraints dramatically reduces candidate set.
Embedding-Based Retrieval: Two-Tower Models
Two-Tower Architecture
- User Tower: Encodes user features into embedding
- Ad Tower: Encodes ad features into embedding
- Similarity: Cosine similarity or dot product between embeddings
This allows semantic matching beyond exact targeting constraints.
Training
- Positive pairs: User-ad pairs that resulted in clicks/conversions
- Negative sampling: Random or hard negatives
- Loss function: Typically contrastive loss or softmax cross-entropy
Approximate Nearest Neighbor Search at Scale
The Challenge
Exact nearest neighbor search is too slow for billions of vectors. We need approximate methods.
Techniques
- Locality-Sensitive Hashing (LSH): Hash similar vectors to same buckets
- Product Quantization: Compress vectors for faster search
- Hierarchical Navigable Small World (HNSW): Graph-based approximate search
- IVF (Inverted File Index): Cluster-based indexing
Tradeoffs
- Recall vs. Latency: More accurate methods are slower
- Index Size: Larger indices enable better recall but use more memory
- Update Frequency: How often the index is rebuilt as new ads are added
Balancing Recall and Latency
The Tradeoff
- Higher recall: More relevant candidates, better final results, but slower
- Lower latency: Faster response, but may miss good candidates
Strategies
- Multi-stage Retrieval: Coarse retrieval followed by fine-grained ranking
- Caching: Cache frequent queries and user embeddings
- Parallel Processing: Retrieve from multiple indexes in parallel
- Early Termination: Stop searching once enough candidates found
The goal is to retrieve enough high-quality candidates (typically 100-1000) for downstream ranking while staying within latency budgets.
Content to be expanded...