June 26, 2026

Deep Agents Pattern: Planner, Files, Subagents 2026

The deep agents pattern behind Claude Code and Deep Research: a planner, filesystem, isolated subagents, and memory. Plus the 15x token cost to plan for.

Sebastian Mondragon

12 min read

Deep Agents Pattern: Planner, Files, Subagents 2026

TL;DR

A deep agent is a four-pillar pattern: a planning tool (write_todos) to keep goals in attention, a virtual filesystem for context offload, isolated subagents to prevent context pollution, and long-term memory. Anthropic's orchestrator-worker setup beat single-agent Opus 4 by 90.2% on its internal research eval, but multi-agent systems burn roughly 15x the tokens of a chat. Token usage alone explained 80% of performance variance on BrowseComp, and KV-cache hit rate (a 10x cost gap: $0.30 vs $3.00 per MTok on Sonnet) becomes your most important production metric. Use a deep agent for open-ended, multi-hour research and engineering tasks, not for narrow, latency-sensitive workflows.

If you have used Claude Code on a large refactor, watched Deep Research assemble a 20-source report, or handed Manus an open-ended task and walked away, you have already used a deep agent. The deep agents pattern is the architecture that lets an agent stay coherent across dozens or hundreds of steps instead of falling apart after ten. In January 2026, LangChain shipped its deepagents library as a research preview and gave the pattern a name, after Claude Code, Deep Research, and Manus had independently converged on the same four moves.

The pattern is worth understanding precisely because it is not magic. It is four concrete pillars layered on top of an ordinary tool-calling loop: a planning tool to keep goals in attention, a virtual filesystem to offload context, isolated subagents to prevent context pollution, and long-term memory across runs. Each pillar solves a specific failure mode of the naive single-loop agent. Together they buy you depth, and depth costs tokens. Anthropic measured its multi-agent research system using roughly 15x the tokens of a chat interaction.

This post breaks down each pillar, the token-economics you need to budget for, and the honest decision of when a deep agent beats a simpler single-loop agent. The goal is that you finish able to look at a task and say "this needs depth" or "this is a single loop with better tools," because reaching for orchestration on a narrow problem is one of the most expensive mistakes in agent engineering.

What a deep agent is: the four-pillar pattern behind Claude Code, Deep Research, and Manus

Start with the thing a deep agent is built on top of. A standard tool-calling agent is a while loop. The model receives the conversation, decides on a tool call, the runtime executes it, the result is appended to the message history, and the loop repeats until the model emits a final answer. This is the ReAct pattern, and it is genuinely good. For tasks that resolve in 5 to 15 steps, you do not need anything more.

The loop breaks down on long-horizon tasks for three reasons. Context windows fill with intermediate junk and the model starts losing the thread. The agent has no durable place to put work-in-progress, so everything competes for the same prompt budget. And there is no division of labor, so a single context window has to hold the high-level plan and the gritty details of every sub-task simultaneously.

A deep agent patches all three. LangChain's framing names four pillars, and you can map each one to a failure it fixes:

The key insight is that none of these are model capabilities. They are scaffolding. The same Claude or GPT model that drifts on a 60-step single loop stays on track when you wrap it in this structure. That is why the pattern showed up in three different products before anyone formalized it: it is the path of least resistance once you push an agent past the easy regime. If you are still deciding whether your problem even needs this, our breakdown of multi-agent vs single-agent systems is the right place to start, and the broader AI agents pillar guide maps where deep agents sit in the wider landscape.

Pillar	Mechanism	Failure it fixes
Planning tool	`write_todos` rewrites a structured plan each step	Goal drift / attention decay
Virtual filesystem	Read/write files as external context	Context window overflow
Subagents	Delegate sub-tasks to isolated contexts	Context pollution
Long-term memory	Persist state across runs	Amnesia between sessions

Pillar 1: a planning tool (write_todos) to keep goals in attention

The first pillar is the cheapest and, step for step, the highest leverage. You give the agent a tool, conventionally write_todos, whose only job is to author and rewrite a structured plan. In Manus that plan lives in a todo.md file the agent overwrites continuously as it makes progress.

Why does writing a to-do list help a model that already has its instructions in the system prompt? Attention decay. In a long context window, the influence of a token on the next prediction weakens as more tokens pile up after it. Instructions you gave 40,000 tokens ago compete with everything that has happened since. The Manus team described this as the agent forgetting its objective on long chains, and their fix was deliberate: rewrite the goals into the most recent slice of context on every iteration. By "reciting" the plan back to the end of the window, you push the objective into the model's freshest attention right before it decides what to do next.

The mechanics look roughly like this:

# Each loop iteration, before the model picks its next action,
# the agent updates its own plan and the result lands in recent context.
write_todos([
    {"task": "Gather Q1 competitor pricing", "status": "done"},
    {"task": "Cross-check against public filings", "status": "in_progress"},
    {"task": "Draft comparison table", "status": "pending"},
    {"task": "Write executive summary", "status": "pending"},
])

Two benefits fall out of this beyond keeping the model on task. First, the plan is a human-readable trace. When a deep agent goes sideways, the to-do file tells you what it thought it was doing, which beats reverse-engineering intent from a wall of tool calls. Second, the explicit decomposition front-loads reasoning that the model would otherwise redo implicitly on every step. Write the plan once, reference it many times.

A practical note: do not over-structure the planning tool. Free-form markdown that the model rewrites in full works better in practice than a rigid task graph with strict state transitions, because the model can reshape the plan as it learns. The point is recitation, not project management.

Pillar 2: a virtual filesystem for context offload

The second pillar treats the filesystem as external, unlimited context. Instead of carrying the full text of every document, search result, and intermediate artifact in the prompt window, the agent writes them to files and keeps only references and summaries in context. When it needs the detail again, it reads the file.

This directly attacks context-window overflow. A research task might touch 50 web pages. Loading all of them into the prompt is impossible and, even where it fits, expensive and degrading. The Manus team made this explicit: they treat the filesystem as the model's externalized memory, a place where context can be offloaded losslessly and recalled on demand, so the active window stays lean. The model learns to use ls, read_file, and write_file the way a human researcher uses a notes folder.

The compression discipline matters. A good deep agent does not just dump raw output to disk. It writes a distilled note ("Source 7: competitor lists enterprise tier at $X/seat, 2026 pricing page") and keeps the URL so it can re-fetch if needed. This is restorable compression: you can always recover the detail, but you do not pay for it on every turn. We go deep on this exact discipline in our guide to agent memory and context management, and the related techniques for keeping long-running agents from drowning in token bloat are essentially the operational playbook for this pillar.

The filesystem also becomes the handoff medium between subagents, which sets up the third pillar. A subagent can write its findings to research/competitor-a.md, and the orchestrator reads the summary without ever pulling the subagent's noisy exploration into its own window.

Pillar 3: isolated subagents and avoiding context pollution

The third pillar is where deep agents earn the "multi-agent" label, and it is the one with the strongest published evidence behind it. The idea: when the agent hits a sub-task that requires heavy, exploratory work, it spawns a subagent with its own fresh, isolated context. The subagent does the messy part (reading 30 files, running a dozen searches, hitting dead ends) and returns only a clean result to the orchestrator. The orchestrator never sees the mess.

The term for the problem this solves is context pollution. When failed tool calls, abandoned searches, and verbose dumps accumulate in one context window, they degrade the model's reasoning on everything downstream. The model has to step over its own past failures. Isolation means the orchestrator's context stays clean and high-signal, holding the plan and the distilled results, while each subagent absorbs the entropy of its own sub-task and then disappears.

Anthropic's multi-agent research system is the canonical proof point. They ran an orchestrator-worker setup: an Opus 4 lead agent decomposed the query and delegated to multiple Sonnet 4 subagents that explored in parallel. That architecture outperformed a single-agent Opus 4 baseline by 90.2% on their internal research evaluation. The gain came from two things working together. Parallel subagents covered more ground at once, and each subagent's isolated context meant the lead never drowned in raw exploration.

Orchestrator (Opus 4)
  ├─ writes plan, decomposes the query
  ├─ spawns Subagent A (Sonnet) → searches vendor docs → returns summary
  ├─ spawns Subagent B (Sonnet) → searches filings    → returns summary
  ├─ spawns Subagent C (Sonnet) → searches news        → returns summary
  └─ synthesizes clean summaries into final report

The cost, which we will quantify in a moment, is that all that parallel exploration burns tokens. But the architecture is the right call when the sub-tasks are genuinely separable and exploratory. For the orchestration mechanics (how to route, how to define the subagent contract, how to aggregate results without losing fidelity), our field notes on multi-agent orchestration that actually works cover the patterns that survive contact with production.

One sharp caveat from Anthropic's own writeup: subagents work for read-heavy, parallelizable work like research. They are far harder to coordinate when sub-tasks must write to shared state or depend on each other's intermediate output, because the isolation that makes them clean also makes them blind to each other. Default to parallel-read, not parallel-write.

Pillar 4: long-term memory across runs

The first three pillars keep an agent coherent within a single run. The fourth gives it continuity across runs. Long-term memory is durable state (facts, preferences, prior results, learned procedures) that survives after the agent finishes and is available when it starts again.

Without it, every session starts from zero. The agent that spent 200 steps mapping your codebase yesterday knows nothing about it today. Long-term memory closes that loop by persisting selected state, usually to the same filesystem abstraction from Pillar 2 or to an external store, and loading the relevant slice back at the start of a new run.

The engineering challenge is selection, not storage. You cannot reload everything; that just reintroduces the context overflow you worked to avoid. A good memory layer is opinionated about what persists: stable facts ("the auth service is in Go, owned by the platform team"), user preferences ("always cite sources inline"), and reusable artifacts (a previously built data dictionary). Transient junk stays out. This is the same restorable-compression discipline as the filesystem pillar, applied across the time dimension instead of within a single run.

In practice, long-term memory is the least standardized of the four pillars and the one most teams implement last. It is also where the most product differentiation hides: an agent that remembers your conventions, your past decisions, and your data layout after a week of use feels categorically more capable than one that reintroduces itself every session, even when the underlying model is identical.

The token-cost tradeoff: 15x usage and the KV-cache hit-rate metric

Now the bill. Depth is not free, and the numbers are large enough that they should drive your architecture decision, not be discovered after launch.

Anthropic reported that their multi-agent research architecture consumed roughly 15x more tokens than a standard chat interaction. That is not a rounding error; it is the central economic fact of the pattern. The same writeup found that on the BrowseComp benchmark, token usage alone explained 80% of the performance variance. Read that carefully: most of the measured improvement was attributable to spending more tokens, distributed across parallel subagents that each maintained their own context. Depth buys quality, and you pay for it in tokens.

This is exactly why the Manus team singled out KV-cache hit rate as the single most important production metric for a deep agent. The KV cache lets the model skip recomputation for a prompt prefix it has already processed, and providers price cached input dramatically lower. On Claude Sonnet, the gap is roughly $0.30 per million cached tokens versus $3.00 per million uncached, a 10x difference. At 15x token volume, whether those tokens hit the cache or miss it is the difference between a viable product and an unaffordable one.

The cache-friendliness rules that follow from this are concrete:

Keep your prompt prefix stable. The cache matches on exact prefix. A single changed token early in the prompt (a timestamp, a reordered tool definition) invalidates everything after it. Pin your system prompt and tool schema.

Append, never insert. Add new context to the end. Editing or reordering earlier content forces recomputation of the entire suffix.

Avoid non-determinism near the front. Things like a "current time" line at the top of a system prompt quietly destroy your hit rate. Move volatile content late or out of the cached span.

Make file reads idempotent. If the same file read produces byte-identical text each time, repeated reads stay cacheable.

The takeaway is that a deep agent's unit economics are dominated by two numbers under your control: how many tokens the orchestration spends and what fraction of them hit cache. Design for both from day one. This is the part of an agent build where a little upfront discipline saves an order of magnitude in cost.

Metric	Single-loop agent	Deep agent (multi-agent)
Token usage vs chat	~1x	~15x
Steps before drift	5 to 20	dozens to hundreds
Cached vs uncached cost (Sonnet)	$0.30 / $3.00 per MTok	$0.30 / $3.00 per MTok
Primary cost lever	total tokens	KV-cache hit rate
Best for	narrow, latency-sensitive	open-ended, exploratory

When a deep agent beats a simpler single-loop agent

Here is the opinionated part. Most tasks do not need a deep agent. The pattern is a high-power, high-cost tool, and the failure mode we see most often across the agent systems we have audited is teams reaching for orchestration on a problem that a well-instrumented single loop would have solved at a tenth of the cost and a fraction of the debugging pain.

Use a deep agent when the task is open-ended, runs long, and rewards parallel exploration:

Deep research and competitive analysis. Many sources, parallelizable reads, a synthesized output. This is the home-field case, where the 90.2% lift Anthropic measured actually shows up.

Large multi-file code changes. A plan, a filesystem of the actual code, and subagents that investigate modules independently. This is what Claude Code does.

Long data-gathering and enrichment jobs. Hundreds of records, each requiring lookups, where intermediate state has to live somewhere other than the prompt window.

Use a single-loop agent (or no agent at all) when the task is narrow and bounded:

Classification, extraction, structured Q&A. One model call or a tight tool loop. Adding a planner and subagents here just adds latency and cost.

Latency-sensitive interactions. A deep agent's many internal steps make sub-second response impossible. If a user is waiting, depth is the wrong trade.

Anything where a single agent with the right tools already works. If better tool design fixes it, fix the tools. Do not escalate to orchestration.

The decision rule we use is blunt: estimate the task's natural step count and whether sub-tasks are separable. Under about 15 to 20 steps with tightly coupled work, stay single-loop. Beyond that, with separable exploratory sub-tasks, the deep-agent pattern starts to pay for its 15x token premium. When you do go deep, adopt the pillars incrementally. Add the planner first (cheapest, highest leverage), then the filesystem, then subagents, then memory. Most teams find that the planner and filesystem alone get them most of the way, and they only reach for subagent isolation when context pollution becomes a measured problem rather than a hypothetical one. For a fuller walkthrough of assembling these pieces into a working system, see our guide on how to build complex AI agents.

This is the exact judgment call we pressure-test in Particula Tech's agent architecture audits: not "can we build a deep agent" but "does this task earn one," before you commit to the token bill and the orchestration complexity. Get that decision right and the four pillars give you an agent that stays coherent across hundreds of steps. Get it wrong and you have built a 15x-cost machine to do a job a single loop would have nailed.

Frequently Asked Questions

Quick answers to common questions about this topic

A deep agent is an agent architecture built on four pillars: a planning tool (typically a write_todos function) that keeps goals in recent attention, a virtual filesystem that offloads context out of the prompt window, subagent delegation with per-subagent context isolation, and long-term memory that persists across runs. LangChain formalized the pattern in its deepagents library, released as a research preview in January 2026 after Claude Code, Deep Research, and Manus proved it in production. The point is depth: a deep agent can run for dozens of steps on an open-ended task without losing the plot, where a single-loop agent drifts.