Prediction Models — CTR, CVR, and Beyond

Understanding what we predict, why it matters, and how to build effective models.

What We're Predicting and Why

Click-Through Rate (CTR)

The probability a user will click an ad. Critical for:

Ranking: Higher CTR ads should rank higher
Pricing: CPC auctions need CTR for expected value
Quality: Low CTR indicates poor relevance

Conversion Rate (CVR)

The probability a click will result in a conversion. Important for:

CPA campaigns: Directly affects advertiser ROI
Ranking: For conversion-optimized campaigns
Budget efficiency: Better targeting improves delivery

Beyond CTR and CVR

Engagement: Time spent, video views, interactions
Quality: User satisfaction, ad relevance
Long-term value: Lifetime value, retention

Feature Engineering: User, Context, Ad, and Cross Features

User Features

Demographics (age, gender, location)
Historical behavior (past clicks, purchases, interests)
Device and browser information
Time-based patterns (time of day, day of week)

Context Features

Page content and category
Time and date
Geographic context
Device type and capabilities

Ad Features

Creative attributes (image, text, format)
Advertiser information
Historical performance (CTR, CVR for similar users)
Targeting settings

Cross Features

User-ad interactions (has user seen this ad before?)
User-advertiser history (past interactions with this advertiser)
User-category affinity (user's interest in ad category)
Contextual matching (ad relevance to page content)

Feature engineering is often more important than model architecture.

Model Architectures: From Logistic Regression to Deep Learning

Logistic Regression

Simple, interpretable baseline. Good for:

Understanding feature importance
Fast inference
Sparse feature spaces

Gradient Boosting (XGBoost, LightGBM)

Strong performance on tabular data:

Handles non-linear interactions
Feature importance insights
Fast training and inference

Deep Learning Models

Wide & Deep

Wide component: Memorizes feature interactions
Deep component: Generalizes to unseen combinations

DeepFM

Factorization machines with deep learning:

Captures low and high-order feature interactions
Efficient for sparse features

Transformer-based Models

For sequential and contextual understanding:

User behavior sequences
Ad creative understanding
Cross-modal features

Calibration: Why It Matters More Than Accuracy

The Problem

Models can have good ranking (AUC) but poor calibration. For auctions, we need accurate probability estimates, not just relative rankings.

Why Calibration Matters

Auction pricing: Incorrect probabilities lead to wrong prices
Budget planning: Advertisers need accurate conversion estimates
Revenue optimization: Platform needs accurate expected value

Calibration Techniques

Platt scaling: Logistic regression on model outputs
Isotonic regression: Non-parametric calibration
Temperature scaling: Single parameter adjustment
Calibrated training: Incorporate calibration into loss function

Multi-Task Learning: Clicks, Conversions, and Engagement Together

Benefits

Shared representations: Learn common patterns across tasks
Data efficiency: Leverage signals from related tasks
Consistency: Predictions align across tasks

Architecture

Shared bottom: Common feature processing
Task-specific towers: Separate heads for CTR, CVR, engagement
Loss weighting: Balance importance of different tasks

Challenges

Task imbalance: Clicks are much more common than conversions
Label quality: Different tasks have different label reliability
Optimization: Balancing multiple objectives

Content to be expanded...