Filtering — What Gets Cut and When
Understanding the filtering hierarchy and why order matters for system efficiency.
The Filtering Hierarchy: Why Order Matters
Filtering should happen in order of:
- Cheapest filters first: Eliminate candidates early to save compute
- Highest rejection rate: Apply filters that remove the most candidates first
- Deterministic before ML: Use rules before expensive model inference
The wrong order can waste significant compute on ads that will eventually be filtered out.
Hard Constraints: Targeting, Eligibility, Policy Compliance
Targeting Constraints
- Geographic restrictions
- Demographic targeting
- Device type requirements
- Time-based restrictions
These are typically checked first using inverted indexes.
Eligibility Checks
- Advertiser account status (active, suspended, etc.)
- Campaign status (running, paused, exhausted)
- Ad creative approval status
- Budget availability
Policy Compliance
- Content policies (prohibited content, brand safety)
- Ad format requirements
- Legal restrictions (age-gated products, etc.)
These filters are deterministic and fast, making them ideal for early-stage filtering.
Brand Safety Filtering: Advertiser and Publisher Controls
Advertiser Controls
Advertisers can specify:
- Block lists: Categories or sites to avoid
- Allow lists: Only show on specific sites
- Content categories: Avoid certain content types
Publisher Controls
Publishers can specify:
- Ad quality standards: Minimum quality scores
- Content restrictions: What types of ads are acceptable
- Brand safety requirements: Protect their brand reputation
Implementation
- Pre-computed lists: Fast lookup tables
- Content classification: ML models for content categorization
- Real-time checks: Verify against current policies
Why Most Filtering Belongs Early (and What Doesn't)
Early Filtering Benefits
- Saves compute: Don't run expensive ML on filtered ads
- Reduces latency: Fewer candidates to process downstream
- Lowers costs: Less infrastructure needed
What Shouldn't Be Filtered Early
- Quality-based filtering: Requires ML predictions
- Diversity requirements: Need to see full candidate set
- Exploration: New ads need evaluation before filtering
The Cost of Late-Stage Filtering: Wasted Compute and Lost Revenue
Wasted Compute
If filtering happens after ML inference:
- Models run on ads that will be filtered
- Feature computation wasted
- Ranking computation unnecessary
Lost Revenue
Late filtering can also hurt revenue:
- Budget exhaustion: Ads filtered after budget check waste budget
- Frequency caps: Filtering after frequency check wastes impressions
- Opportunity cost: Time spent on filtered ads could be used for better candidates
Best Practices
- Filter as early as possible
- Use approximate checks when exact checks are expensive
- Cache filtering results when possible
- Monitor filtering rates at each stage
Content to be expanded...