Introduction
Feature engineering often determines model success, but manual feature discovery doesn't scale. Uber's optimal feature discovery system automates this process, enabling faster model development across the company.
The Problem
Manual Feature Engineering
Traditional approach:
- Domain experts brainstorm features
- Engineers implement features
- Data scientists evaluate importance
- Iterate slowly
Challenges:
- Time-consuming
- Limited by human creativity
- Doesn't scale across use cases
Uber's Solution
Automated Feature Discovery
Raw Data -> Feature Generators -> Candidate Features -> Evaluator -> Top Features
| | |
(automated) (thousands) (model-based)
Feature Generators
Types of automated transformations:
- Aggregations: sum, mean, count, percentiles
- Time windows: 1h, 1d, 7d, 30d
- Categorical: encodings, combinations
- Interactions: products, ratios
Technical Implementation
Feature Template System
# Example feature template
template = FeatureTemplate(
entity="driver",
source="trips",
aggregations=["count", "mean", "sum"],
columns=["fare", "distance", "rating"],
windows=["1d", "7d", "30d"]
)
# Generates: driver_trips_fare_count_1d, driver_trips_fare_mean_7d, etc.
Importance Ranking
Methods used:
- SHAP values: Model-agnostic importance
- Permutation importance: Direct impact measurement
- Forward selection: Greedy feature addition
Scalability
- Distributed computation on Spark
- Feature caching for reuse
- Incremental updates for new data
Use Cases at Uber
ETA Prediction
Discovered features:
- Route traffic patterns by time
- Driver behavior features
- Weather interactions
Fraud Detection
Discovered features:
- Transaction velocity features
- Device fingerprint aggregations
- Cross-entity connections
Results
- X% reduction in feature engineering time
- Y% improvement in model performance
- Hundreds of models using discovered features
Best Practices
- Start with a rich raw feature set
- Invest in feature computation infrastructure
- Human oversight remains important
- Document discovered features
Learn more about feature engineering in our Recommendation Systems at Scale course.