Prediction Models — CTR, CVR, and Beyond
Understanding what we predict, why it matters, and how to build effective models.
What We're Predicting and Why
Click-Through Rate (CTR)
The probability a user will click an ad. Critical for:
- Ranking: Higher CTR ads should rank higher
- Pricing: CPC auctions need CTR for expected value
- Quality: Low CTR indicates poor relevance
Conversion Rate (CVR)
The probability a click will result in a conversion. Important for:
- CPA campaigns: Directly affects advertiser ROI
- Ranking: For conversion-optimized campaigns
- Budget efficiency: Better targeting improves delivery
Beyond CTR and CVR
- Engagement: Time spent, video views, interactions
- Quality: User satisfaction, ad relevance
- Long-term value: Lifetime value, retention
Feature Engineering: User, Context, Ad, and Cross Features
User Features
- Demographics (age, gender, location)
- Historical behavior (past clicks, purchases, interests)
- Device and browser information
- Time-based patterns (time of day, day of week)
Context Features
- Page content and category
- Time and date
- Geographic context
- Device type and capabilities
Ad Features
- Creative attributes (image, text, format)
- Advertiser information
- Historical performance (CTR, CVR for similar users)
- Targeting settings
Cross Features
- User-ad interactions (has user seen this ad before?)
- User-advertiser history (past interactions with this advertiser)
- User-category affinity (user's interest in ad category)
- Contextual matching (ad relevance to page content)
Feature engineering is often more important than model architecture.
Model Architectures: From Logistic Regression to Deep Learning
Logistic Regression
Simple, interpretable baseline. Good for:
- Understanding feature importance
- Fast inference
- Sparse feature spaces
Gradient Boosting (XGBoost, LightGBM)
Strong performance on tabular data:
- Handles non-linear interactions
- Feature importance insights
- Fast training and inference
Deep Learning Models
Wide & Deep
- Wide component: Memorizes feature interactions
- Deep component: Generalizes to unseen combinations
DeepFM
Factorization machines with deep learning:
- Captures low and high-order feature interactions
- Efficient for sparse features
Transformer-based Models
For sequential and contextual understanding:
- User behavior sequences
- Ad creative understanding
- Cross-modal features
Calibration: Why It Matters More Than Accuracy
The Problem
Models can have good ranking (AUC) but poor calibration. For auctions, we need accurate probability estimates, not just relative rankings.
Why Calibration Matters
- Auction pricing: Incorrect probabilities lead to wrong prices
- Budget planning: Advertisers need accurate conversion estimates
- Revenue optimization: Platform needs accurate expected value
Calibration Techniques
- Platt scaling: Logistic regression on model outputs
- Isotonic regression: Non-parametric calibration
- Temperature scaling: Single parameter adjustment
- Calibrated training: Incorporate calibration into loss function
Multi-Task Learning: Clicks, Conversions, and Engagement Together
Benefits
- Shared representations: Learn common patterns across tasks
- Data efficiency: Leverage signals from related tasks
- Consistency: Predictions align across tasks
Architecture
- Shared bottom: Common feature processing
- Task-specific towers: Separate heads for CTR, CVR, engagement
- Loss weighting: Balance importance of different tasks
Challenges
- Task imbalance: Clicks are much more common than conversions
- Label quality: Different tasks have different label reliability
- Optimization: Balancing multiple objectives
Content to be expanded...