Reddit's ML Model Deployment and Serving Architecture

Introduction

Reddit serves personalized content to hundreds of millions of users through ML-powered ranking and recommendations. This case study examines their model deployment and serving architecture.

Requirements

Scale

Millions of ranking requests per minute
Sub-100ms latency for feed generation
Thousands of models across use cases

Use Cases

Feed ranking
Community recommendations
Content moderation
Ad targeting

Architecture Overview

Deployment Pipeline

Model Training -> Validation -> Registry -> Deployment -> Serving
       |              |            |            |            |
   (offline)     (staging)    (versioned)  (canary)    (production)

Serving Infrastructure

Request -> Load Balancer -> Model Server Pool -> Response
                |                  |
           (routing)         (inference)

Technical Details

Model Registry

Features:

Versioning: Track all model versions
Metadata: Metrics, lineage, ownership
Promotion: Stage-to-prod workflows

Serving Stack

Reddit uses:

Custom model servers for latency-critical paths
TensorFlow Serving for standard models
Feature store integration for real-time features

Canary Deployment

deployment:
  stages:
    - name: canary
      traffic: 1%
      duration: 1h
      metrics:
        - latency_p99 < 50ms
        - error_rate < 0.1%
    - name: gradual
      traffic: 10%, 50%, 100%
      duration: 4h each

Ranking System

Feed Ranking

Factors:

Post quality signals
User engagement history
Community affinity
Time decay

Model Updates

Daily retraining for trending content
Weekly full retraining for stable features
Continuous monitoring for drift

Challenges and Solutions

Cold Start

Community-based defaults
Popularity fallback
Quick personalization from early signals

Latency

Feature pre-computation
Model optimization (quantization, pruning)
Caching for common queries

Results

X% improvement in engagement metrics
Y% reduction in serving latency
Faster iteration cycles for ML teams

Learn deployment best practices in our Recommendation Systems at Scale course.