Agent Context Engineering: Optimizing LLM Agent Performance

Introduction

LLM agents are only as good as the context they receive. Context engineering - the art of crafting and managing what information reaches the agent - is critical for reliable agent systems.

The Context Problem

Limited Context Windows

Even with 100K+ token models:

More context != better performance
Irrelevant context hurts accuracy
Cost scales with tokens

Agent-Specific Challenges

Long trajectories fill context
Tools output verbose results
Error recovery adds tokens

Context Engineering Principles

Principle 1: Relevance First

def prepare_context(task, history, tools):
    relevant_history = select_relevant(history, task)
    relevant_tools = filter_tools(tools, task)
    return format_context(task, relevant_history, relevant_tools)

Principle 2: Recency Matters

Recent information is usually more important:

Recent actions inform next steps
Recent observations reflect current state
Recent errors need addressing

Principle 3: Summarize Aggressively

def compress_trajectory(trajectory):
    if len(trajectory) < THRESHOLD:
        return trajectory

    recent = trajectory[-N:]  # Keep recent full
    older = summarize(trajectory[:-N])  # Summarize older
    return older + recent

Practical Techniques

Hierarchical Context

Level 1: Current task and immediate context
Level 2: Session history (summarized)
Level 3: User preferences (compressed)
Level 4: World knowledge (RAG when needed)

Dynamic Tool Documentation

Don't include all tools always:

Route to relevant tool subsets
Provide examples on-demand
Cache common tool patterns

Observation Compression

Tool outputs can be verbose:

def compress_observation(raw_output, task):
    # Extract only relevant information
    if is_large(raw_output):
        return extract_relevant(raw_output, task)
    return raw_output

Architecture Patterns

Memory-Augmented Agents

Task -> Context Builder -> Agent -> Action
             |                |
        Memory Search    Memory Write

Attention-Based Selection

Let the model help select context:

First pass: Identify relevant context
Second pass: Execute with relevant context

Evaluation

Metrics

Task completion rate
Context utilization (tokens used vs. available)
Hallucination rate
Cost per task

A/B Testing

Test context strategies
Measure downstream metrics
Iterate based on data

Best Practices

Start with minimal context and add as needed
Measure context utilization
Implement progressive disclosure
Cache and reuse common contexts
Monitor for context-related failures

Build effective agents with our RAG Systems at Scale course.