design pattern 2024-08-15 13 min read

Agent Context Engineering: Optimizing LLM Agent Performance

Learn how to engineer context effectively for LLM agents to improve task completion and reduce hallucinations.

agents LLM context prompt engineering optimization

Introduction

LLM agents are only as good as the context they receive. Context engineering - the art of crafting and managing what information reaches the agent - is critical for reliable agent systems.

The Context Problem

Limited Context Windows

Even with 100K+ token models:

  • More context != better performance
  • Irrelevant context hurts accuracy
  • Cost scales with tokens

Agent-Specific Challenges

  • Long trajectories fill context
  • Tools output verbose results
  • Error recovery adds tokens

Context Engineering Principles

Principle 1: Relevance First

def prepare_context(task, history, tools):
    relevant_history = select_relevant(history, task)
    relevant_tools = filter_tools(tools, task)
    return format_context(task, relevant_history, relevant_tools)

Principle 2: Recency Matters

Recent information is usually more important:

  • Recent actions inform next steps
  • Recent observations reflect current state
  • Recent errors need addressing

Principle 3: Summarize Aggressively

def compress_trajectory(trajectory):
    if len(trajectory) < THRESHOLD:
        return trajectory

    recent = trajectory[-N:]  # Keep recent full
    older = summarize(trajectory[:-N])  # Summarize older
    return older + recent

Practical Techniques

Hierarchical Context

Level 1: Current task and immediate context
Level 2: Session history (summarized)
Level 3: User preferences (compressed)
Level 4: World knowledge (RAG when needed)

Dynamic Tool Documentation

Don't include all tools always:

  • Route to relevant tool subsets
  • Provide examples on-demand
  • Cache common tool patterns

Observation Compression

Tool outputs can be verbose:

def compress_observation(raw_output, task):
    # Extract only relevant information
    if is_large(raw_output):
        return extract_relevant(raw_output, task)
    return raw_output

Architecture Patterns

Memory-Augmented Agents

Task -> Context Builder -> Agent -> Action
             |                |
        Memory Search    Memory Write

Attention-Based Selection

Let the model help select context:

  • First pass: Identify relevant context
  • Second pass: Execute with relevant context

Evaluation

Metrics

  • Task completion rate
  • Context utilization (tokens used vs. available)
  • Hallucination rate
  • Cost per task

A/B Testing

  • Test context strategies
  • Measure downstream metrics
  • Iterate based on data

Best Practices

  1. Start with minimal context and add as needed
  2. Measure context utilization
  3. Implement progressive disclosure
  4. Cache and reuse common contexts
  5. Monitor for context-related failures

Build effective agents with our RAG Systems at Scale course.

Want to Go Deeper?

This article is part of our comprehensive curriculum on building ML systems at scale. Explore our full courses for hands-on learning.