The Four Moves of Context Engineering in Modern LLM Agents

Context engineering in today's mainstream LLM agents can be boiled down to four fundamental moves: write, select, compress, and isolate. Most "smart" products (Claude, ChatGPT, Cursor, Windsurf, and OpenAI/Anthropic-style research agents) are just different orchestrations of these four operations inside a single model's context window.

Understanding these patterns is crucial for anyone building with AI agents. Let's break down each one.

1. Write: Externalize the Brain

The first move is about persistence. Agents need to remember things beyond what fits in their immediate context window.

Scratchpads and State Objects

Agents persist state outside the context window via scratchpads: files, state objects, or databases. This allows them to keep:

Long-term plans
Intermediate reasoning steps
Tool results and outputs
Decision histories

All without blowing up their token budgets.

Long-Term Memory Systems

Modern systems like ChatGPT memories, Cursor/Windsurf rules, and Reflexion-style self-notes store:

User-specific facts and preferences
Project conventions and patterns
Reflections on past mistakes
Cross-session learning

This externalized memory can be reused across sessions, making agents feel genuinely persistent and personalized.

2. Select: Just-in-Time Retrieval

Writing everything down is useless if you dump it all back into context. The second move is selective retrieval.

Controlled State Exposure

At each step, the agent pulls in only the relevant slice of its scratchpad or state. Instead of loading all history, it:

Reads specific files on demand
Exposes only relevant state variables
Filters tool results before surfacing them

RAG and Semantic Search

For larger knowledge stores, modern systems use:

Embeddings to find semantically similar content
Graph structures to traverse relationships
Tool-description RAG to pick the right capabilities

The goal is to keep the active context narrow but useful: bringing in exactly what's needed, exactly when it's needed.

3. Compress: Shrink Without Forgetting

As conversations grow (hundreds of turns, heavy tool usage), the context window fills up fast. The third move is compression.

Automatic Summarization

Systems like Claude Code's auto-compaction periodically:

Summarize prior dialogue into distilled representations
Create recursive or hierarchical summaries
Collapse tool traces into outcome-focused notes

This keeps the agent under the context window limit while preserving decision-relevant information.

Heuristic Pruning

Smart pruning strategies remove:

Older or low-value tokens
Redundant information
Completed sub-tasks

The system preserves only the minimal set of turns and events needed for the next decision.

4. Isolate: Prevent Context Interference

The final move is isolation: keeping different concerns separate to avoid confusion.

Multi-Agent Architectures

Designs like Anthropic's multi-agent researcher, OpenAI Swarm-style teams, and LangGraph supervisor patterns isolate:

Instructions per sub-agent
Tools and capabilities
State and memory

Each agent gets a smaller, more focused context window optimized for its specific role.

Sandboxed Artifacts

Structured systems keep bulky or sensitive artifacts out of the visible context:

File contents
Images and media
Intermediate data structures

These are stored externally and surfaced only when explicitly needed, reducing token load and preventing context pollution.

The Parallel Agent Problem

Here's where it gets interesting and challenging.

Key Insight: All four strategies assume a single "authoritative" context window whose content can be consistently written, selected, compressed, and isolated over time. Within one LLM, the state and trade-offs are globally coordinated.

Why This Breaks for Swarms

In large parallel agent swarms, each agent maintains its own local write/select/compress/isolate loop. This creates fundamental coordination problems:

Synchronization challenges:

Which facts are canonical?
How should shared knowledge be compressed?
When should information be shared vs. isolated?

The systems problem: These become unsolved systems problems rather than prompt engineering problems. Current context-engineering patterns don't yet provide a robust, general solution for cross-agent coordination.

This is why single-agent systems (ChatGPT, Claude, Cursor) work so well, while true multi-agent swarms remain an active research frontier.

Building with These Patterns

Understanding these four moves helps you:

Design better prompts that work with the model's memory systems
Structure your tools for optimal retrieval and compression
Choose the right architecture for your use case
Anticipate limitations when scaling to multi-agent scenarios

The future of AI agents will likely involve new patterns for coordinating these operations across multiple contexts. But for now, mastering write, select, compress, and isolate is the foundation of effective context engineering.