Monitoring, Debugging, and Closing the Loop

How to monitor production systems, detect issues, and continuously improve.

Metrics to Monitor in Production

Revenue Metrics

  • RPM (Revenue Per Mille): Overall revenue per 1000 impressions
  • Revenue per query: Average revenue per user request
  • Fill rate: Percentage of requests that result in served ads
  • eCPM: Effective cost per mille (what advertisers pay)

User Experience Metrics

  • CTR: Click-through rate (engagement indicator)
  • Ad load: Number of ads per page
  • User satisfaction: Surveys, negative feedback rates
  • Page load time: Impact of ads on page performance

Advertiser Metrics

  • ROAS: Return on ad spend for advertisers
  • Conversion rates: Clicks to conversions
  • Budget delivery: How smoothly budgets are spent
  • Campaign performance: Overall advertiser satisfaction

System Health Metrics

  • Latency: P50, P95, P99 response times
  • Error rates: Failed requests, timeouts
  • Throughput: Requests per second
  • Resource utilization: CPU, memory, network

Detecting Model Degradation and Drift

Model Degradation

Performance decline over time:

  • Accuracy: Predictions become less accurate
  • Calibration: Probabilities drift from actual rates
  • Revenue impact: System generates less revenue

Drift Detection

Data Drift

  • Feature distributions: User behavior changes
  • Ad inventory: New ads, new advertisers
  • Market conditions: Economic changes affect behavior

Concept Drift

  • CTR patterns: User clicking behavior changes
  • Conversion patterns: What drives conversions shifts
  • Quality signals: Relevance standards evolve

Detection Methods

  • Statistical tests: Compare current vs. historical distributions
  • Model performance: Track accuracy on holdout data
  • A/B testing: Compare new models to current
  • Anomaly detection: Identify unusual patterns

Diagnosing Revenue Drops: Model, Market, or Bug?

Model Issues

  • Stale models: Not retrained with recent data
  • Overfitting: Model doesn't generalize
  • Feature bugs: Incorrect feature computation
  • Calibration drift: Predictions no longer calibrated

Market Changes

  • Advertiser behavior: Bids change, budgets shift
  • User behavior: Clicking patterns change
  • Competition: New platforms, market saturation
  • Seasonality: Expected patterns (holidays, events)

Bugs

  • Code bugs: Logic errors in serving pipeline
  • Data bugs: Incorrect data in features or logs
  • Infrastructure bugs: System failures, network issues
  • Configuration bugs: Wrong settings, thresholds

Diagnosis Process

  1. Check system health: Is infrastructure working?
  2. Review recent changes: What was deployed recently?
  3. Analyze metrics: Which metrics changed and when?
  4. Compare segments: Is issue global or specific?
  5. Trace examples: Follow specific requests through system

Tracing a Bad Ad Through the System

The Problem

An ad that shouldn't have been shown (low quality, wrong targeting, etc.) was served. Why?

Tracing Steps

  1. Retrieval: Was ad in candidate set? Why?
  2. Filtering: Did it pass all filters? Should it have?
  3. Prediction: What were model predictions? Were they correct?
  4. Ranking: What was the score? Why did it rank high?
  5. Auction: Did it win fairly? Was price correct?
  6. Serving: Was correct ad served? Any last-minute changes?

Tools Needed

  • Request IDs: Track single request through entire pipeline
  • Distributed tracing: See all service calls for a request
  • Feature logs: See exact features used in predictions
  • Decision logs: See all filtering and ranking decisions

Case Studies: Real Production Incidents

Case Study 1: Model Calibration Drift

Symptom: Revenue dropped 5% over 2 weeks Investigation: Found CTR predictions were overconfident Root cause: Model not retrained, user behavior shifted Fix: Retrained model with recent data, improved calibration Prevention: Automated retraining pipeline, calibration monitoring

Case Study 2: Feature Bug

Symptom: Certain user segments had unusually low CTR Investigation: Traced to user feature computation Root cause: Bug in feature engineering pipeline Fix: Corrected feature computation, backfilled historical data Prevention: Feature validation tests, monitoring feature distributions

Case Study 3: Auction Mechanism Issue

Symptom: Fill rate dropped, many auctions had no winners Investigation: Found reserve prices too high Root cause: Recent change to reserve price algorithm Fix: Rolled back change, fixed algorithm Prevention: Gradual rollouts, A/B testing for revenue changes

These case studies illustrate the importance of comprehensive monitoring and debugging capabilities.

Content to be expanded...