design pattern 2024-08-10 14 min read

Two Tower Models in Industry: Complete Implementation Guide

Comprehensive guide to implementing two-tower models for retrieval including training, serving, and optimization.

two-tower embeddings retrieval recommendations architecture

Introduction

Two-tower models have become the standard architecture for large-scale retrieval systems. This guide covers everything from theory to production implementation.

Architecture Overview

Basic Structure

Query Tower              Item Tower
    |                        |
Query Features         Item Features
    |                        |
[Dense Layers]         [Dense Layers]
    |                        |
Query Embedding        Item Embedding
    |                        |
    +-------- dot(q, i) -----+
                |
            Similarity

Why Two Towers?

  1. Decomposability: Compute item embeddings offline
  2. Scalability: ANN search for retrieval
  3. Flexibility: Separate tower architectures

Training

Loss Functions

Contrastive Loss (InfoNCE)

def contrastive_loss(query_emb, pos_item_emb, neg_item_embs, temperature=0.1):
    pos_score = dot(query_emb, pos_item_emb) / temperature
    neg_scores = dot(query_emb, neg_item_embs) / temperature
    return -log(softmax(pos_score, neg_scores))

Batch Negatives

  • Use other batch items as negatives
  • Efficient GPU utilization
  • Need large batches (1000+)

Hard Negative Mining

Easy negatives provide weak signal:

def mine_hard_negatives(query, all_items, num_hard=10):
    # Get approximate nearest neighbors
    candidates = ann_search(query, all_items, top_k=100)
    # Filter out positives
    negatives = [c for c in candidates if not is_positive(c)]
    return negatives[:num_hard]

Tower Design

Query Tower

Features:

  • Query text (embedded)
  • User history (aggregated)
  • Context (time, device)

Architecture:

  • BERT/transformer for text
  • Pooling layers for sequences
  • MLP for final projection

Item Tower

Features:

  • Item text/title
  • Item attributes
  • Engagement statistics

Architecture:

  • Similar to query tower
  • Can be asymmetric

Serving Architecture

Offline Pipeline

All Items -> Item Tower -> Item Embeddings -> ANN Index

Online Pipeline

User Query -> Query Tower -> Query Embedding -> ANN Search -> Results

Index Types

Index Type Build Time Query Time Memory
Brute Force O(1) O(n) Low
IVF O(n) O(sqrt(n)) Medium
HNSW O(n log n) O(log n) High

Optimization Techniques

Embedding Compression

  • Quantization: FP32 -> INT8
  • Dimensionality reduction: PCA
  • Product quantization: Split and quantize

Training Improvements

  • Temperature scheduling: Start high, anneal
  • Curriculum learning: Easy to hard negatives
  • Multi-task training: Multiple objectives

Production Considerations

Embedding Freshness

  • Real-time vs. batch updates
  • Incremental index building
  • Versioning strategy

Monitoring

  • Embedding drift
  • ANN recall vs. exact
  • Latency distributions

Common Pitfalls

  1. Batch size too small: Need thousands for good negatives
  2. No hard negatives: Model doesn't learn fine distinctions
  3. Dimension mismatch: Towers must output same size
  4. Index stale: Embeddings outdated

Master two-tower models in our Recommendation Systems at Scale course.

Want to Go Deeper?

This article is part of our comprehensive curriculum on building ML systems at scale. Explore our full courses for hands-on learning.