Introduction
LLM agents are only as good as the context they receive. Context engineering - the art of crafting and managing what information reaches the agent - is critical for reliable agent systems.
The Context Problem
Limited Context Windows
Even with 100K+ token models:
- More context != better performance
- Irrelevant context hurts accuracy
- Cost scales with tokens
Agent-Specific Challenges
- Long trajectories fill context
- Tools output verbose results
- Error recovery adds tokens
Context Engineering Principles
Principle 1: Relevance First
def prepare_context(task, history, tools):
relevant_history = select_relevant(history, task)
relevant_tools = filter_tools(tools, task)
return format_context(task, relevant_history, relevant_tools)
Principle 2: Recency Matters
Recent information is usually more important:
- Recent actions inform next steps
- Recent observations reflect current state
- Recent errors need addressing
Principle 3: Summarize Aggressively
def compress_trajectory(trajectory):
if len(trajectory) < THRESHOLD:
return trajectory
recent = trajectory[-N:] # Keep recent full
older = summarize(trajectory[:-N]) # Summarize older
return older + recent
Practical Techniques
Hierarchical Context
Level 1: Current task and immediate context
Level 2: Session history (summarized)
Level 3: User preferences (compressed)
Level 4: World knowledge (RAG when needed)
Dynamic Tool Documentation
Don't include all tools always:
- Route to relevant tool subsets
- Provide examples on-demand
- Cache common tool patterns
Observation Compression
Tool outputs can be verbose:
def compress_observation(raw_output, task):
# Extract only relevant information
if is_large(raw_output):
return extract_relevant(raw_output, task)
return raw_output
Architecture Patterns
Memory-Augmented Agents
Task -> Context Builder -> Agent -> Action
| |
Memory Search Memory Write
Attention-Based Selection
Let the model help select context:
- First pass: Identify relevant context
- Second pass: Execute with relevant context
Evaluation
Metrics
- Task completion rate
- Context utilization (tokens used vs. available)
- Hallucination rate
- Cost per task
A/B Testing
- Test context strategies
- Measure downstream metrics
- Iterate based on data
Best Practices
- Start with minimal context and add as needed
- Measure context utilization
- Implement progressive disclosure
- Cache and reuse common contexts
- Monitor for context-related failures
Build effective agents with our RAG Systems at Scale course.