Introduction
Reddit serves personalized content to hundreds of millions of users through ML-powered ranking and recommendations. This case study examines their model deployment and serving architecture.
Requirements
Scale
- Millions of ranking requests per minute
- Sub-100ms latency for feed generation
- Thousands of models across use cases
Use Cases
- Feed ranking
- Community recommendations
- Content moderation
- Ad targeting
Architecture Overview
Deployment Pipeline
Model Training -> Validation -> Registry -> Deployment -> Serving
| | | | |
(offline) (staging) (versioned) (canary) (production)
Serving Infrastructure
Request -> Load Balancer -> Model Server Pool -> Response
| |
(routing) (inference)
Technical Details
Model Registry
Features:
- Versioning: Track all model versions
- Metadata: Metrics, lineage, ownership
- Promotion: Stage-to-prod workflows
Serving Stack
Reddit uses:
- Custom model servers for latency-critical paths
- TensorFlow Serving for standard models
- Feature store integration for real-time features
Canary Deployment
deployment:
stages:
- name: canary
traffic: 1%
duration: 1h
metrics:
- latency_p99 < 50ms
- error_rate < 0.1%
- name: gradual
traffic: 10%, 50%, 100%
duration: 4h each
Ranking System
Feed Ranking
Factors:
- Post quality signals
- User engagement history
- Community affinity
- Time decay
Model Updates
- Daily retraining for trending content
- Weekly full retraining for stable features
- Continuous monitoring for drift
Challenges and Solutions
Cold Start
- Community-based defaults
- Popularity fallback
- Quick personalization from early signals
Latency
- Feature pre-computation
- Model optimization (quantization, pruning)
- Caching for common queries
Results
- X% improvement in engagement metrics
- Y% reduction in serving latency
- Faster iteration cycles for ML teams
Learn deployment best practices in our Recommendation Systems at Scale course.