The Four Moves of Context Engineering in Modern LLM Agents

Understanding how Claude, ChatGPT, and Cursor manage context windows—and why these patterns break in parallel agent swarms.

JJoshua Park

Context engineering in today's mainstream LLM agents can be boiled down to four fundamental moves: write, select, compress, and isolate. Most "smart" products (Claude, ChatGPT, Cursor, Windsurf, and OpenAI/Anthropic-style research agents) are just different orchestrations of these four operations inside a single model's context window.

Understanding these patterns is crucial for anyone building with AI agents. Let's break down each one.

Context Engineering Strategies

The four strategies of context engineering. Source:

LangChain Blog

1. Write: Externalize the Brain

The first move is about persistence. Agents need to remember things beyond what fits in their immediate context window.

Scratchpads and State Objects

Agents persist state outside the context window via scratchpads: files, state objects, or databases. This allows them to keep:

  • Long-term plans
  • Intermediate reasoning steps
  • Tool results and outputs
  • Decision histories

All without blowing up their token budgets.

Long-Term Memory Systems

Modern systems like ChatGPT memories, Cursor/Windsurf rules, and Reflexion-style self-notes store:

  • User-specific facts and preferences
  • Project conventions and patterns
  • Reflections on past mistakes
  • Cross-session learning

This externalized memory can be reused across sessions, making agents feel genuinely persistent and personalized.

2. Select: Just-in-Time Retrieval

Writing everything down is useless if you dump it all back into context. The second move is selective retrieval.

Controlled State Exposure

At each step, the agent pulls in only the relevant slice of its scratchpad or state. Instead of loading all history, it:

  • Reads specific files on demand
  • Exposes only relevant state variables
  • Filters tool results before surfacing them

For larger knowledge stores, modern systems use:

  • Embeddings to find semantically similar content
  • Graph structures to traverse relationships
  • Tool-description RAG to pick the right capabilities

The goal is to keep the active context narrow but useful: bringing in exactly what's needed, exactly when it's needed.

3. Compress: Shrink Without Forgetting

As conversations grow (hundreds of turns, heavy tool usage), the context window fills up fast. The third move is compression.

Automatic Summarization

Systems like Claude Code's auto-compaction periodically:

  • Summarize prior dialogue into distilled representations
  • Create recursive or hierarchical summaries
  • Collapse tool traces into outcome-focused notes

This keeps the agent under the context window limit while preserving decision-relevant information.

Heuristic Pruning

Smart pruning strategies remove:

  • Older or low-value tokens
  • Redundant information
  • Completed sub-tasks

The system preserves only the minimal set of turns and events needed for the next decision.

4. Isolate: Prevent Context Interference

The final move is isolation: keeping different concerns separate to avoid confusion.

Multi-Agent Architectures

Designs like Anthropic's multi-agent researcher, OpenAI Swarm-style teams, and LangGraph supervisor patterns isolate:

  • Instructions per sub-agent
  • Tools and capabilities
  • State and memory

Each agent gets a smaller, more focused context window optimized for its specific role.

Sandboxed Artifacts

Structured systems keep bulky or sensitive artifacts out of the visible context:

  • File contents
  • Images and media
  • Intermediate data structures

These are stored externally and surfaced only when explicitly needed, reducing token load and preventing context pollution.

The Parallel Agent Problem

Here's where it gets interesting and challenging.

Key Insight: All four strategies assume a single "authoritative" context window whose content can be consistently written, selected, compressed, and isolated over time. Within one LLM, the state and trade-offs are globally coordinated.

Why This Breaks for Swarms

In large parallel agent swarms, each agent maintains its own local write/select/compress/isolate loop. This creates fundamental coordination problems:

Synchronization challenges:

  • Which facts are canonical?
  • How should shared knowledge be compressed?
  • When should information be shared vs. isolated?

The systems problem: These become unsolved systems problems rather than prompt engineering problems. Current context-engineering patterns don't yet provide a robust, general solution for cross-agent coordination.

This is why single-agent systems (ChatGPT, Claude, Cursor) work so well, while true multi-agent swarms remain an active research frontier.

Building with These Patterns

Understanding these four moves helps you:

  1. Design better prompts that work with the model's memory systems
  2. Structure your tools for optimal retrieval and compression
  3. Choose the right architecture for your use case
  4. Anticipate limitations when scaling to multi-agent scenarios

The future of AI agents will likely involve new patterns for coordinating these operations across multiple contexts. But for now, mastering write, select, compress, and isolate is the foundation of effective context engineering.


Interested in building AI products that remember? At Membase, we're working on unified memory systems that work across all your AI agents. Join our waitlist to learn more.