A comprehensive guide to context engineering - the evolution from prompt engineering to managing the holistic state available to LLMs for building steerable, effective agents. Learn strategies for optimizing context windows, managing attention budgets, and designing efficient agent architectures.

Effective Context Engineering for AI Agents

After a few years of prompt engineering being the focus of attention in applied AI, a new term has come to prominence: context engineering. Building with language models is becoming less about finding the right words and phrases for your prompts, and more about answering the broader question of "what configuration of context is most likely to generate our model's desired behavior?"

Context refers to the set of tokens included when sampling from a large-language model (LLM). The engineering problem at hand is optimizing the utility of those tokens against the inherent constraints of LLMs in order to consistently achieve a desired outcome. Effectively wrangling LLMs often requires thinking in context — in other words: considering the holistic state available to the LLM at any given time and what potential behaviors that state might yield.

In this post, we'll explore the emerging art of context engineering and offer a refined mental model for building steerable, effective agents.

Context Engineering vs. Prompt Engineering

At Anthropic, we view context engineering as the natural progression of prompt engineering. Prompt engineering refers to methods for writing and organizing LLM instructions for optimal outcomes. Context engineering refers to the set of strategies for curating and maintaining the optimal set of tokens (information) during LLM inference, including all the other information that may land there outside of the prompts.

In the early days of engineering with LLMs, prompting was the biggest component of AI engineering work, as the majority of use cases outside of everyday chat interactions required prompts optimized for one-shot classification or text generation tasks. As the term implies, the primary focus of prompt engineering is how to write effective prompts, particularly system prompts. However, as we move towards engineering more capable agents that operate over multiple turns of inference and longer time horizons, we need strategies for managing the entire context state (system instructions, tools, Model Context Protocol (MCP), external data, message history, etc).

An agent running in a loop generates more and more data that could be relevant for the next turn of inference, and this information must be cyclically refined. Context engineering is the art and science of curating what will go into the limited context window from that constantly evolving universe of possible information.

Context Engineering in Claude Code

For developers using Claude Code, context engineering becomes even more critical as it directly impacts the effectiveness of AI-assisted coding. Claude Code is designed as a low-level, unopinionated tool that provides close to raw model access without forcing specific workflows. This flexibility requires developers to actively manage context to achieve optimal results.

Unlike traditional prompt engineering where you might write a single prompt and get a response, Claude Code operates in a continuous session where context accumulates over time. Each interaction, file read, tool usage, and code modification contributes to the growing context that Claude uses to understand and respond to your requests.

Prompt engineering vs. context engineering

In contrast to the discrete task of writing a prompt, context engineering is iterative and the curation phase happens each time we decide what to pass to the model.

Why Context Engineering is Important to Building Capable Agents

Despite their speed and ability to manage larger and larger volumes of data, we've observed that LLMs, like humans, lose focus or experience confusion at a certain point. Studies on needle-in-a-haystack style benchmarking have uncovered the concept of context rot: as the number of tokens in the context window increases, the model's ability to accurately recall information from that context decreases.

While some models exhibit more gentle degradation than others, this characteristic emerges across all models. Context, therefore, must be treated as a finite resource with diminishing marginal returns. Like humans, who have limited working memory capacity, LLMs have an "attention budget" that they draw on when parsing large volumes of context. Every new token introduced depletes this budget by some amount, increasing the need to carefully curate the tokens available to the LLM.

This attention scarcity stems from architectural constraints of LLMs. LLMs are based on the transformer architecture, which enables every token to attend to every other token across the entire context. This results in n² pairwise relationships for n tokens.

As its context length increases, a model's ability to capture these pairwise relationships gets stretched thin, creating a natural tension between context size and attention focus. Additionally, models develop their attention patterns from training data distributions where shorter sequences are typically more common than longer ones. This means models have less experience with, and fewer specialized parameters for, context-wide dependencies.

Techniques like position encoding interpolation allow models to handle longer sequences by adapting them to the originally trained smaller context, though with some degradation in token position understanding. These factors create a performance gradient rather than a hard cliff: models remain highly capable at longer contexts but may show reduced precision for information retrieval and long-range reasoning compared to their performance on shorter contexts.

These realities mean that thoughtful context engineering is essential for building capable agents.

The Anatomy of Effective Context

Given that LLMs are constrained by a finite attention budget, good context engineering means finding the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome. Implementing this practice is much easier said than done, but in the following section, we outline what this guiding principle means in practice across the different components of context.

Context Engineering for Claude Code Users

For developers using Claude Code for AI-assisted programming, context engineering takes on specific importance and practical applications. Claude Code users can leverage context engineering principles to optimize their AI coding workflows, improve code quality, and reduce token consumption.

Leveraging CLAUDE.md Files for Context Management

One of the most powerful context engineering tools in Claude Code is the CLAUDE.md file system. These special files are automatically pulled into context when starting a conversation, making them ideal for documenting project-specific information that Claude needs to know.

Effective CLAUDE.md files should include:

Common Commands: Document frequently used bash commands to save Claude from having to ask or guess
Code Style Guidelines: Specify coding conventions, import preferences, and formatting standards
Project Structure Information: Explain the repository layout, key directories, and important files
Development Environment Setup: Detail required tools, version requirements, and setup instructions
Workflow Conventions: Document team practices for branching, testing, and deployment

Unlike generic system prompts, CLAUDE.md files should be project-specific and team-shared. They become part of the persistent context that Claude uses across all interactions with your codebase.

CLAUDE.md File Locations and Hierarchy:

Claude Code supports multiple CLAUDE.md files that are automatically loaded based on your current directory:

Project Root (CLAUDE.md): Team-shared project-level configuration, committed to Git for all members
Project Root (CLAUDE.local.md): Personal local override configuration, usually added to .gitignore to avoid affecting others
Parent Directory (CLAUDE.md): Upper-level configuration automatically inherited in Monorepo structure (recursive upward search)
Subdirectory (CLAUDE.md): Independent configuration for specific submodules/features (loaded with priority over parent configuration)
User Global (~/.claude/CLAUDE.md): User global default configuration, applicable to baseline settings for all Claude sessions

Best Practices for CLAUDE.md Files:

Keep files concise and human-readable (typically under 50 lines)
Use bullet points and clear headings for easy scanning
Regularly review and update as project evolves
Include version-specific information for projects with multiple active branches
Document deprecated practices to prevent Claude from suggesting outdated approaches

Strategic File Mentions and Context Loading

Claude Code allows you to explicitly tell Claude to read specific files using natural language instructions like "read logging.py" or "look at the authentication module." This gives you fine-grained control over what context Claude loads at any given time.

Effective strategies include:

Pre-loading Key Files: Before starting complex tasks, have Claude read the most relevant files to establish context
Progressive Context Loading: Load files incrementally as needed rather than overwhelming Claude with too much information upfront
Context Refresh: Periodically ask Claude to re-read files if they've been modified during the session
Selective File Loading: Use tab-completion to quickly reference files or folders anywhere in your repository, helping Claude find or update the right resources

Advanced File Loading Techniques:

Directory Analysis: Ask Claude to "analyze the structure of the src/services/user/ directory" before implementing changes
Cross-File References: When working on related components, load multiple files to maintain consistency
Historical Context: For bug fixes, load both current implementation and related test files
Dependency Mapping: Load dependency files to understand how changes might affect other parts of the system

Managing Message History in Long Sessions

Claude Code sessions can accumulate extensive message history over time, leading to context bloat and potential performance degradation. Effective context engineering requires actively managing this history.

Strategies for Claude Code users:

Use the /compact Command: Periodically compress conversation history to preserve key information while reducing token count. This built-in feature compresses conversation history, keeping only context summaries to reduce token usage while preserving essential information.
Clear Irrelevant History: Use /clear to remove completed tasks from context when they're no longer relevant. During long sessions, Claude's context window can fill with irrelevant conversation, file contents, and commands. This can reduce performance and sometimes distract Claude.
Break Large Tasks: Divide complex projects into smaller, focused sessions to maintain context clarity. For large tasks with multiple steps or requiring exhaustive solutions—like code migrations, fixing numerous lint errors, or running complex build scripts—improve performance by having Claude use a Markdown file (or even a GitHub issue!) as a checklist and working scratchpad.
Session Segmentation: For multi-day projects, consider starting fresh sessions rather than carrying forward old context that may no longer be relevant

Tool Selection and Context Efficiency

Claude Code's tool ecosystem (MCP servers, custom commands, bash tools) directly impacts context efficiency. Each tool interaction adds to the context window, so thoughtful tool selection is crucial.

Best practices:

Minimize Tool Chatter: Configure tool allowlists to reduce permission prompts for trusted operations. You can customize the allowlist to permit additional tools that you know are safe, or to allow potentially unsafe tools that are easy to undo (e.g., file editing, git commit).
Use Custom Slash Commands: Create reusable command templates that provide Claude with structured context for common tasks. Custom commands come in two types: User-level commands (placed in ~/.claude/commands/ directory) and Project-level commands (placed in .claude/commands/ directory under project root).
Leverage MCP Servers: Use Model Context Protocol servers to provide Claude with structured access to external systems without flooding the context window
Tool Allowlist Management: Use the /permissions command after starting Claude Code to add or remove tools from the allowlist. For example, you can add Edit to always allow file edits, Bash(git commit:*) to allow git commits, or mcp__puppeteer__puppeteer_navigate to allow navigating with the Puppeteer MCP server.

Context-Driven Task Planning

Effective context engineering in Claude Code involves strategic task planning that considers context limitations and optimal loading patterns:

Pre-activation Approach: Before asking Claude to implement a solution, first let it read and understand the relevant context. For example, if refactoring a backend module, first ask Claude to read the entire module, analyze directory structure, and summarize existing functionality before entering the coding phase.
Document-First Workflow: Write a PLAN.md for any task before having the model execute it. This approach consolidates the "7-Layer Prompt Stack" (Tool/Language/Project/Persona/Component/Task/Query) into a single document that Claude can reference for stable context.
Scope Convergence: Implement file whitelists + directory blacklists + maximum change line limits to contain model uncertainty within defined boundaries rather than scattered across chat histories.
Diff-Driven Development: Only accept unified diffs; prohibit full-file rewrites or "repository scanning." This approach makes rollbacks painless because everything is delivered as unified diffs.

Context Window Optimization Techniques

Claude Code provides several built-in mechanisms for optimizing context window usage:

The /compact Command: Compresses conversation history, keeping only context summaries to reduce token usage while preserving essential information
The /clear Command: Completely clear conversation history when starting new, unrelated tasks
Session Segmentation: Breaking large projects into multiple focused sessions rather than one long-running session
Selective File Loading: Loading only the most relevant files for immediate tasks rather than the entire codebase
Checklists and Scratchpads: For large tasks, use Markdown files as checklists and working scratchpads to externalize context that doesn't need to remain in the conversation history

Calibrating the system prompt in the process of context engineering.

At one end of the spectrum, we see brittle if-else hardcoded prompts, and at the other end we see prompts that are overly general or falsely assume shared context.

System Prompts

We recommend organizing prompts into distinct sections (like <background_information>, <instructions>, ## Tool guidance, ## Output description, etc) and using techniques like XML tagging or Markdown headers to delineate these sections, although the exact formatting of prompts is likely becoming less important as models become more capable.

Regardless of how you decide to structure your system prompt, you should be striving for the minimal set of information that fully outlines your expected behavior. (Note that minimal does not necessarily mean short; you still need to give the agent sufficient information up front to ensure it adheres to the desired behavior.) It's best to start by testing a minimal prompt with the best model available to see how it performs on your task, and then add clear instructions and examples to improve performance based on failure modes found during initial testing.

Tools

Tools allow agents to operate with their environment and pull in new, additional context as they work. Because tools define the contract between agents and their information/action space, it's extremely important that tools promote efficiency, both by returning information that is token efficient and by encouraging efficient agent behaviors.

In "Writing tools for AI agents – with AI agents", we discussed building tools that are well understood by LLMs and have minimal overlap in functionality. Similar to the functions of a well-designed codebase, tools should be self-contained, robust to error, and extremely clear with respect to their intended use. Input parameters should similarly be descriptive, unambiguous, and play to the inherent strengths of the model.

One of the most common failure modes we see is bloated tool sets that cover too much functionality or lead to ambiguous decision points about which tool to use. If a human engineer can't definitively say which tool should be used in a given situation, an AI agent can't be expected to do better. As we'll discuss later, curating a minimal viable set of tools for the agent can also lead to more reliable maintenance and pruning of context over long interactions.

Examples

Providing examples, otherwise known as few-shot prompting, is a well known best practice that we continue to strongly advise. However, teams will often stuff a laundry list of edge cases into a prompt in an attempt to articulate every possible rule the LLM should follow for a particular task. We do not recommend this. Instead, we recommend working to curate a set of diverse, canonical examples that effectively portray the expected behavior of the agent. For an LLM, examples are the "pictures" worth a thousand words.

Message History

Message history is a critical component of context for agents operating over multiple turns. However, simply appending all previous messages to the context window is rarely optimal. Effective context engineering requires thoughtful curation of message history, considering factors such as:

Relevance: Which previous interactions are still relevant to the current task?
Recency: How recent does information need to be to remain useful?
Compression: Can previous interactions be summarized to preserve key information while reducing token count?

Our overall guidance across the different components of context (system prompts, tools, examples, message history, etc) is to be thoughtful and keep your context informative, yet tight. Now let's dive into dynamically retrieving context at runtime.

Context Retrieval and Agentic Search

In "Building effective AI agents", we highlighted the differences between LLM-based workflows and agents. Since we wrote that post, we've gravitated towards a simple definition for agents: LLMs autonomously using tools in a loop.

Working alongside our customers, we've seen the field converging on this simple paradigm. As the underlying models become more capable, the level of autonomy of agents can scale: smarter models allow agents to independently navigate nuanced problem spaces and recover from errors.

We're now seeing a shift in how engineers think about designing context for agents. Today, many AI-native applications employ some form of embedding-based pre-inference time retrieval to surface important context for the agent to reason over. As the field transitions to more agentic approaches, we increasingly see teams augmenting these retrieval systems with "just in time" context strategies.

Beyond storage efficiency, the metadata of these references provides a mechanism to efficiently track and manage the provenance of information that influences agent behavior. This can be particularly valuable for debugging, auditing, and ensuring compliance in regulated environments.

Dynamic Context Retrieval in Claude Code

Claude Code users can implement dynamic context retrieval strategies through thoughtful interaction patterns. Rather than loading all possible project files upfront, effective Claude Code workflows involve progressive context loading based on task requirements.

Key strategies for Claude Code users:

Strategic File Reading: Instead of asking Claude to "read the entire codebase," selectively load files based on immediate task needs
Search-Based Discovery: Use natural language queries like "find all files that handle user authentication" to let Claude discover relevant files
Context Refresh: Periodically ask Claude to re-read files that may have changed during development
URL-Based Context Loading: Paste specific URLs alongside your prompts for Claude to fetch and read. To avoid permission prompts for the same domains (e.g., docs.foo.com), use /permissions to add domains to your allowlist.
Data Integration: Pass data into Claude through multiple methods: copy and paste directly into your prompt (most common approach), pipe into Claude Code (e.g., cat foo.txt | claude), tell Claude to pull data via bash commands, MCP tools, or custom slash commands, or ask Claude to read files or fetch URLs (works for images too).

Claude Code's ability to agentically explore the file system makes it particularly well-suited for dynamic context retrieval. Users can ask Claude to "search for files related to payment processing" and Claude will actively explore the codebase to identify and load relevant files.

Advanced Context Retrieval Patterns:

Multi-Source Context Loading: Combine file reading, URL fetching, and data piping in a single workflow. For example, pipe in a log file, then tell Claude to use a tool to pull in additional context to debug the logs.
Progressive Discovery: Start with high-level queries and progressively drill down into specifics. For example, first ask "what authentication methods does this project support?" then follow up with "show me the implementation of JWT authentication."
Cross-Reference Loading: When working on interconnected components, load related files to maintain consistency. For example, when modifying an API endpoint, also load the corresponding service layer and data access components.
Historical Context Retrieval: For bug fixes, load both the current implementation and historical context such as related commits or issue descriptions.

Context Retrieval Best Practices:

Be Specific: Claude can infer intent, but it can't read minds. Specificity leads to better alignment with expectations. Instead of "add tests for foo.py," specify "write a new test case for foo.py, covering the edge case where the user is logged out. Avoid mocks."
Provide Reference Points: When working with design mocks as reference points for UI development, or visual charts for analysis and debugging, provide images to Claude. This is particularly useful for visual tasks.
Use Checklists: For complex tasks, have Claude create and maintain a checklist of items to address, which serves as both a context management tool and progress tracker.
Externalize Context: For large tasks, use external files (Markdown documents, GitHub issues) as working scratchpads to keep the main conversation focused on high-level direction.

Dynamic Context Retrieval

Rather than loading all possible context upfront, effective agents employ dynamic retrieval strategies that fetch relevant information as needed. This approach offers several advantages:

Token Efficiency: Only relevant information consumes context window space
Freshness: Information can be retrieved in real-time, ensuring up-to-date context
Scalability: Agents can access vast knowledge bases without being constrained by context window limits
Relevance: Retrieved information can be tailored to the specific task at hand

Key strategies for dynamic context retrieval include:

Semantic Search: Using embeddings to find contextually relevant information
Metadata Filtering: Narrowing retrieval scope based on document properties (date, source, type, etc.)
Hybrid Retrieval: Combining multiple retrieval methods for improved results
Recursive Retrieval: Using the agent's own analysis to guide subsequent retrieval operations

Context Window Management

As agents operate over longer time horizons, managing the context window becomes increasingly critical. Effective strategies include:

Summarization: Condensing previous interactions while preserving key information
Forgetting Mechanisms: Systematically removing outdated or irrelevant information
Hierarchical Context: Organizing context into layers of importance
Attention Guidance: Using explicit instructions to direct the model's focus

Context Window Management in Claude Code

Claude Code provides specific tools and commands for managing context window limitations:

The /compact Command: This built-in feature compresses conversation history, keeping only context summaries to reduce token usage while preserving essential information. During long sessions, Claude's context window can fill with irrelevant conversation, file contents, and commands. This can reduce performance and sometimes distract Claude. Use the /compact command frequently between tasks to reset the context window.
The /clear Command: Completely clear conversation history when starting new, unrelated tasks. This eliminates accumulated context that may no longer be relevant and gives Claude a fresh start for new tasks.
Session Segmentation: Breaking large projects into multiple focused sessions rather than one long-running session. For multi-day projects, consider starting fresh sessions rather than carrying forward old context that may no longer be relevant.
Selective File Loading: Loading only the most relevant files for immediate tasks rather than the entire codebase. Instead of asking Claude to "read the entire codebase," selectively load files based on immediate task needs.
Checklists and Scratchpads: For large tasks with multiple steps or requiring exhaustive solutions—like code migrations, fixing numerous lint errors, or running complex build scripts—improve performance by having Claude use a Markdown file (or even a GitHub issue!) as a checklist and working scratchpad.

Advanced Context Window Management Techniques:

Task-Based Context Boundaries: Define clear boundaries for each task and use /clear when transitioning between tasks to prevent context bleed.
Context Archiving: For complex multi-phase projects, archive context at natural breakpoints by documenting key decisions and starting fresh sessions for new phases.
Hierarchical Context Loading: Load context in layers, starting with high-level architectural information, then drilling down to specific implementation details as needed.
External Context Storage: Use external documents (PLAN.md, GitHub issues, Markdown files) to store detailed specifications and reference information, keeping the main conversation focused on high-level direction and immediate implementation concerns.
Periodic Context Audits: Regularly review what context is being maintained and remove information that is no longer relevant to the current task.

For long-running development sessions, we recommend using /compact periodically to maintain performance while preserving the most important context. When switching to completely different tasks, using /clear can help Claude focus on the new requirements without distraction from previous work. During long sessions, Claude's context window can fill with irrelevant conversation, file contents, and commands. This can reduce performance and sometimes distract Claude. Use the /clear command frequently between tasks to reset the context window.

Best Practices for Context Engineering

Based on our experience working with customers and building agents ourselves, we've identified several key best practices for effective context engineering:

1. Start with Minimal Context

Begin with the absolute minimum context needed for your agent to understand its task. This approach has several benefits:

Easier Debugging: Less context means fewer variables to consider when troubleshooting
Better Understanding: Forces you to truly understand what information is essential
Improved Performance: Reduces the risk of context rot and attention dilution
Faster Iteration: Smaller contexts enable quicker experimentation cycles

For Claude Code users, this means starting with a clear task description rather than loading all project files upfront. Let Claude ask for specific files as needed.