End-to-End Flow
Understanding the complete journey from user request to served ad is critical for building efficient systems.
The 100-Millisecond Journey: From Page Load to Served Ad
The entire ad serving process must complete in roughly 100 milliseconds:
- User Request: Page load or content request triggers ad request
- Ad Request: Platform receives request with user context
- Candidate Retrieval: Billions of ads filtered to thousands of candidates
- Filtering: Hard constraints applied (targeting, policy, eligibility)
- Prediction: ML models predict CTR, CVR, and other signals
- Ranking: Candidates scored and ranked
- Auction: Winners selected and prices determined
- Serving: Ad creative retrieved and served to user
- Logging: All signals captured for model training and optimization
Latency Budgets and the Critical Path
Every millisecond matters:
- Network Latency: 20-40ms for request/response
- Retrieval: 10-20ms to fetch candidates
- ML Inference: 20-40ms for predictions
- Auction: 5-10ms for ranking and selection
- Serving: 5-10ms for creative retrieval
Optimizing the critical path is essential for meeting latency targets.
The Three Planes: Real-Time Serving, Near-Real-Time Streaming, Batch Processing
Real-Time Serving Plane
The request-response path that must complete in <100ms:
- Candidate retrieval
- Real-time predictions
- Auction execution
- Ad serving
Near-Real-Time Streaming Plane
Processing that happens within seconds to minutes:
- Feature updates (user behavior, recent clicks)
- Budget pacing adjustments
- Frequency cap updates
- Real-time model scoring updates
Batch Processing Plane
Offline processing that happens hourly or daily:
- Model training
- Feature engineering
- Historical analysis
- Reporting and optimization
Understanding which operations belong in which plane is crucial for system design.
Content to be expanded...