The Four Moves of Context Engineering in Modern LLM Agents
Understanding how Claude, ChatGPT, and Cursor manage context windows—and why these patterns break in parallel agent swarms.
Context engineering in today's mainstream LLM agents can be boiled down to four fundamental moves: write, select, compress, and isolate. Most "smart" products (Claude, ChatGPT, Cursor, Windsurf, and OpenAI/Anthropic-style research agents) are just different orchestrations of these four operations inside a single model's context window.
Understanding these patterns is crucial for anyone building with AI agents. Let's break down each one.

The four strategies of context engineering. Source:
LangChain Blog
1. Write: Externalize the Brain
The first move is about persistence. Agents need to remember things beyond what fits in their immediate context window.
Scratchpads and State Objects
Agents persist state outside the context window via scratchpads: files, state objects, or databases. This allows them to keep:
- Long-term plans
- Intermediate reasoning steps
- Tool results and outputs
- Decision histories
All without blowing up their token budgets.
Long-Term Memory Systems
Modern systems like ChatGPT memories, Cursor/Windsurf rules, and Reflexion-style self-notes store:
- User-specific facts and preferences
- Project conventions and patterns
- Reflections on past mistakes
- Cross-session learning
This externalized memory can be reused across sessions, making agents feel genuinely persistent and personalized.
2. Select: Just-in-Time Retrieval
Writing everything down is useless if you dump it all back into context. The second move is selective retrieval.
Controlled State Exposure
At each step, the agent pulls in only the relevant slice of its scratchpad or state. Instead of loading all history, it:
- Reads specific files on demand
- Exposes only relevant state variables
- Filters tool results before surfacing them
RAG and Semantic Search
For larger knowledge stores, modern systems use:
- Embeddings to find semantically similar content
- Graph structures to traverse relationships
- Tool-description RAG to pick the right capabilities
The goal is to keep the active context narrow but useful: bringing in exactly what's needed, exactly when it's needed.
3. Compress: Shrink Without Forgetting
As conversations grow (hundreds of turns, heavy tool usage), the context window fills up fast. The third move is compression.
Automatic Summarization
Systems like Claude Code's auto-compaction periodically:
- Summarize prior dialogue into distilled representations
- Create recursive or hierarchical summaries
- Collapse tool traces into outcome-focused notes
This keeps the agent under the context window limit while preserving decision-relevant information.
Heuristic Pruning
Smart pruning strategies remove:
- Older or low-value tokens
- Redundant information
- Completed sub-tasks
The system preserves only the minimal set of turns and events needed for the next decision.
4. Isolate: Prevent Context Interference
The final move is isolation: keeping different concerns separate to avoid confusion.
Multi-Agent Architectures
Designs like Anthropic's multi-agent researcher, OpenAI Swarm-style teams, and LangGraph supervisor patterns isolate:
- Instructions per sub-agent
- Tools and capabilities
- State and memory
Each agent gets a smaller, more focused context window optimized for its specific role.
Sandboxed Artifacts
Structured systems keep bulky or sensitive artifacts out of the visible context:
- File contents
- Images and media
- Intermediate data structures
These are stored externally and surfaced only when explicitly needed, reducing token load and preventing context pollution.
The Parallel Agent Problem
Here's where it gets interesting and challenging.
Key Insight: All four strategies assume a single "authoritative" context window whose content can be consistently written, selected, compressed, and isolated over time. Within one LLM, the state and trade-offs are globally coordinated.
Why This Breaks for Swarms
In large parallel agent swarms, each agent maintains its own local write/select/compress/isolate loop. This creates fundamental coordination problems:
Synchronization challenges:
- Which facts are canonical?
- How should shared knowledge be compressed?
- When should information be shared vs. isolated?
The systems problem: These become unsolved systems problems rather than prompt engineering problems. Current context-engineering patterns don't yet provide a robust, general solution for cross-agent coordination.
This is why single-agent systems (ChatGPT, Claude, Cursor) work so well, while true multi-agent swarms remain an active research frontier.
Building with These Patterns
Understanding these four moves helps you:
- Design better prompts that work with the model's memory systems
- Structure your tools for optimal retrieval and compression
- Choose the right architecture for your use case
- Anticipate limitations when scaling to multi-agent scenarios
The future of AI agents will likely involve new patterns for coordinating these operations across multiple contexts. But for now, mastering write, select, compress, and isolate is the foundation of effective context engineering.
Interested in building AI products that remember? At Membase, we're working on unified memory systems that work across all your AI agents. Join our waitlist to learn more.