What is durable execution for AI agents?

Durable execution is an engine that journals every step of an agent run to persistent storage so that when the process crashes, the agent resumes from the exact step it died on instead of starting over. For agents this matters because each step is expensive and non-repeatable: an LLM call you already paid for, a tool that already charged a card, a human approval that already happened. The engine records the result of each step once and replays it from the journal on recovery, never re-executing the underlying side effect. Temporal, Inngest, and Restate all implement this with slightly different ergonomics, but the core guarantee is identical: the agent loop survives crashes, restarts, deploys, and multi-day waits without losing or duplicating work.

Why can't I just add retries to my agent loop?

Naive retries re-run everything from the top, which is exactly wrong for agents. If your agent crashes after step 7 of a 10-step plan, a retry replays steps 1 through 6: another six LLM calls you already paid for, another six tool invocations that may charge cards, send emails, or write rows twice. Retries also can't survive a human-in-the-loop gate that takes hours or days, because the process holding the loop in memory will be long gone. Durable execution solves both: it replays completed steps from a journal (no re-execution of side effects) and it can suspend a workflow for days waiting on an approval, then wake it on the exact line where it paused. Retries handle transient blips; durable execution handles the whole lifecycle.

Temporal vs Inngest vs Restate: which should I pick?

Pick Temporal when workflows run for days or weeks, you need a 99.99% SLA, and you have the operational appetite for a server plus workers (or Temporal Cloud). Pick Inngest when your stack is TypeScript, you want durable steps inside your existing app code with no separate worker fleet, and your runs are bursty event-driven jobs rather than month-long sagas. Pick Restate when you want Temporal's journal-and-replay guarantees with a lighter single-binary footprint and exactly-once writes without hand-rolling idempotency keys. The wrong way to choose is by GitHub stars. Choose by workflow duration, payload size, your team's language, and whether the pricing model survives your retry volume.

Why do large LLM payloads break Temporal workflows?

Temporal records every activity input and output into the workflow's event history, and that history has size limits (a soft warning around 10K events or 10MB and a hard cap historically near 50MB). Agents move large payloads: full prompts, multi-thousand-token completions, retrieved documents, tool results. Pass those directly through activities and the history saturates fast, which slows replay, inflates storage, and can fail the workflow outright. The fix is a payload codec or the claim-check pattern: store the large blob in S3 or a database and pass only a reference through history. This is the single most common Temporal-for-agents mistake we see, and it is entirely avoidable if you design payload handling before you ship.

How does durable execution handle human-in-the-loop approval gates?

Durable execution turns a human approval gate into a workflow that suspends with zero compute cost until a signal arrives. The agent reaches the gate, the engine persists its full state to the journal, and the workflow sleeps (Temporal calls these signals, Inngest uses waitForEvent, Restate uses awakeables). Hours or days later, when a human clicks approve in your UI, you send the signal and the workflow wakes on the exact line where it paused, with all prior state intact. No polling loop, no process held in memory, no lost context. This is the cleanest reason to adopt durable execution for agents that take consequential actions: the wait is free, the resume is exact, and the audit trail is automatic.

Do I still need idempotency keys with durable execution?

Yes for Temporal and Inngest, and less so for Restate. Durable execution guarantees an activity result is recorded once and replayed on recovery, but it cannot guarantee the activity ran exactly once on the external system if a crash lands mid-call (after the side effect, before the result is journaled). That at-least-once edge means any write tool (charge a card, send an email, create a record) still needs an idempotency key so a re-execution is a no-op. Restate narrows this gap with built-in exactly-once semantics for its handlers, reducing how much app-level idempotency you hand-roll. Either way, treat every side-effecting tool as if it can run more than once and make it safe to do so.

Does durable execution slow down my agent?

It adds journaling overhead per step, not per token, so the cost is small relative to LLM latency. Each completed step is persisted to durable storage before the next runs, which adds milliseconds to tens of milliseconds depending on the engine and backend. Against an LLM call that takes 1 to 5 seconds, that overhead is noise. The real performance trap is not latency, it is history bloat: oversized payloads in Temporal slow replay and inflate storage, which is why you offload large blobs to external storage and pass references. Design payloads correctly and durable execution is effectively free on the latency budget while buying you crash resilience, exact resumption, and free multi-day waits.

BLOG/AI AGENTS

Durable Execution for Agents: Temporal vs Inngest vs Restate

Durable execution keeps agents alive through crashes, retries, and approval gates. Temporal vs Inngest vs Restate on payload limits, pricing, and determinism.

Sebastian MondragonMAY 25, 2026 · 12 MIN READ

Durable Execution for Agents: Temporal vs Inngest vs Restate

A coding agent that has been running for forty minutes, eleven tool calls deep into a refactor, hits an unhandled exception in the worker process and dies. Without durable execution for AI agents, everything is gone: the plan it built, the files it already edited, the eight LLM calls you already paid for, the test run it kicked off. The retry, if you have one, starts from token zero and re-does work that may have already mutated your repo. This is the failure mode that makes agents feel unreliable in production even when the model is fine.

The fix is not a better prompt or a bigger model. It is durable execution: an engine that journals every step of the agent run to persistent storage, so a crashed agent resumes from the exact step it died on instead of replaying the whole loop. The agent's LLM outputs are recorded once and replayed from history on recovery, never re-called. A tool that already ran is not run again. A human approval that already happened is not requested twice. Three platforms own this space for application developers in 2026: Temporal, Inngest, and Restate. They share the same core model and diverge sharply on payload handling, pricing, language fit, and operational weight.

This post is the decision framework we use to scope durable execution for agent systems. We will cover why agents specifically need it (probabilistic steps, expensive tool calls, and approval gates that retries cannot survive), the architecture-specific traps in each platform, how checkpointing and resumption actually work, and a decision matrix that picks the right engine by workflow maturity, payload size, cost model, and language. The mistake we see most often is bolting an agent loop onto a generic job queue and discovering, in week one of production, that a queue is not a workflow engine.

01 · Why Agents Need Durable Execution (and Queues Don't Cut It)

A traditional background job is short, idempotent, and cheap to retry from scratch. An agent run is none of those things. It is long (seconds to days), stateful (the plan and accumulated context are the work), and built from steps that are individually expensive and often irreversible. Retrying an agent from the top is not a safety net, it is a way to double-charge a customer and re-corrupt a file.

Three properties of agents break the naive queue-plus-retry pattern:

Steps are probabilistic and non-repeatable. Every LLM call costs money and produces a different result each time. If your agent crashes after step 7, you do not want a retry to re-run steps 1 through 6, that is six more paid completions and six more tool invocations, and the rerun may take a different path entirely because the model is non-deterministic. Durable execution records each step's result once and replays it from the journal, so recovery is exact, not approximate.

Tool calls have side effects. Agents send emails, charge cards, open pull requests, and write database rows. A retry that re-runs a completed tool call sends the email twice. Durable execution combined with idempotent tools means a replayed step is a no-op on the external system.

Approval gates outlive the process. The moment your agent takes a consequential action, you want a human-in-the-loop approval gate in front of it. That gate might take five minutes or five days. No in-memory loop survives a deploy, a scale-down, or a crash across that window. Durable execution suspends the workflow at the gate with zero running compute and wakes it on the exact line when the approval signal arrives.

A job queue gives you at-least-once delivery and retries. It does not give you a journal, deterministic replay, suspendable multi-day waits, or exactly-once step semantics. Those are the durable execution primitives, and they are why this is its own category rather than a feature you add to BullMQ or SQS.

The Core Model: Record Once, Replay on Recovery

All three engines work the same way under the hood. Your workflow code runs, and every side-effecting operation (an LLM call, a tool invocation, a sleep, a wait-for-signal) is wrapped in a durable step. When that step completes, its result is written to a journal. If the process crashes and restarts, the engine re-runs your workflow code from the top, but every step it has already seen in the journal returns the recorded result instantly instead of executing again. The code "fast-forwards" through completed work and resumes live execution at the first unrecorded step. This is why two rules are non-negotiable across all three platforms. First, workflow code must be deterministic: no Math.random(), no Date.now(), no direct network calls outside a durable step, because on replay the code must follow the identical path. Anything non-deterministic goes inside a step so its result is journaled. Second, the LLM output is recorded to history once and replayed on recovery, never re-called. The model's answer becomes a fixed fact in the journal the moment the step completes.

02 · Temporal: The Mature Default With a Payload Trap

Temporal is the most battle-tested durable execution platform, descended from Uber's Cadence and run at scale across large engineering orgs. The model is workflow-as-code: you write a workflow function that orchestrates activities (your side-effecting steps), and Temporal guarantees exactly-once execution semantics for activities through its history-and-replay engine. In early 2026 Temporal shipped Nexus to GA (cross-namespace, cross-team service calls) and Multi-Region Replication to GA with a 99.99% SLA, which closes the last serious gaps for regulated, multi-region agent deployments.

For agents, Temporal's strengths are real. Multi-day and multi-week workflows are first-class. Signals and queries give you clean human-in-the-loop gates and live state inspection. The SDK exists for Go, Java, TypeScript, Python, and .NET, so you are not locked into one language. The operational tradeoff is weight: you run a Temporal server (or pay for Temporal Cloud) plus a worker fleet, and there is a genuine learning curve around determinism constraints and the activity-versus-workflow boundary.

The LLM Payload Saturation Trap

The single most common way teams break Temporal-for-agents is workflow history saturation. Temporal records every activity input and output into the workflow's event history. That history has limits: a soft warning around 10,000 events or roughly 10MB, and a hard ceiling historically near 50MB. Agents move large payloads through every step: full prompts, multi-thousand-token completions, retrieved document chunks, fat tool results. Pass those directly through activity boundaries and history bloats fast, which slows replay, inflates storage cost, and can fail the workflow when it blows the cap. The fix is the payload codec or claim-check pattern: store the large blob in external storage (S3, GCS, a database) and pass only a reference through history. Temporal's payload codec API lets you intercept payloads before they hit history, compress or offload them, and rehydrate on read. Design this before you ship, not after the first workflow dies at 50MB. We treat large-payload offloading as a default for any Temporal agent that touches documents or long completions, the same discipline behind agent memory and context management applies to the durable layer underneath them.

03 · Inngest: Fastest Path From App Code, Watch the Step Bill

Inngest takes the opposite stance on operational weight. There is no worker fleet to run and no separate server to operate in the self-managed sense, you write durable functions in your existing TypeScript (or Python/Go) application and Inngest's platform invokes them over HTTP in response to events. The ergonomics are the cleanest in the category for app developers: you wrap each durable boundary in step.run() with normal async/await, and Inngest memoizes the result so a re-run skips completed steps.

TYPESCRIPT

export const researchAgent = inngest.createFunction(
  { id: "research-agent" },
  { event: "agent/research.requested" },
  async ({ event, step }) => {
    const plan = await step.run("plan", () => llm.plan(event.data.task));

    const results = [];
    for (const subtask of plan.subtasks) {
      // each step.run result is journaled; a crash replays it, never re-calls
      const r = await step.run(`tool-${subtask.id}`, () => runTool(subtask));
      results.push(r);
    }

    // suspend with zero compute until a human approves
    await step.waitForEvent("approval", {
      event: "agent/approved",
      timeout: "3d",
      match: "data.runId",
    });

    return step.run("finalize", () => llm.synthesize(results));
  }
);

That waitForEvent is the human-in-the-loop gate: the function suspends for up to three days at zero running cost and wakes when the approval event lands. This is genuinely the fastest way to make an existing TypeScript agent durable.

The Step-Based Pricing Trap

Inngest prices on steps executed (plus runs), and that model interacts badly with agents in one specific way: retries and multi-model fan-out multiply your billed step count. An agent that calls a planner, then five tools, then a synthesizer is seven steps minimum. Add rate-limit retries against a busy provider, a reranker pass, a guardrail check, and a couple of model fallbacks, and a single logical run can bill fifteen to thirty steps. Multiply across thousands of runs and the bill grows on a curve you did not model from the demo, where a clean three-step function looked cheap. Before committing, instrument your real step count per run including retries, not the happy path, and price against that. The convenience is real, but so is the way the meter runs when an agent's step graph is wide and retry-heavy.

04 · Restate: Same Guarantees, Lighter Footprint, Exactly-Once Writes

Restate is the newest of the three and uses the same journal-and-replay model as Temporal, but ships as a single self-contained binary (the Restate server) with a notably lighter operational footprint. You write durable handlers in TypeScript, Java/Kotlin, Go, Python, or Rust, and Restate journals each step exactly like Temporal does. The pitch is "durable execution at the service boundary": Restate sits in front of your services and makes their handlers durable, resumable, and exactly-once without the heavier Temporal server plus worker topology.

The differentiator that matters most for agents is exactly-once semantics for writes without hand-rolling app-level idempotency keys. In Temporal and Inngest, the durable engine guarantees a step's result is recorded once and replayed, but it does not guarantee the underlying external write happened exactly once if a crash lands mid-call (after the side effect, before the journal write). That at-least-once edge is why you still add idempotency keys to write tools. Restate narrows this gap: its execution model gives handlers exactly-once invocation guarantees, so the amount of idempotency plumbing you write by hand drops. You should still design write tools to be safe under replay, but Restate does more of the work for you.

Restate's awakeables are its human-in-the-loop primitive (suspend, hand out a token, resume when the token is completed), and its lighter footprint makes it attractive when you do not want to operate a full Temporal cluster but you do want Temporal-grade guarantees.

05 · Side-by-Side: The Decision Axes That Matter

The vendor pages push feature checklists. The actual decision is four axes: workflow maturity and duration, how you handle large LLM payloads, the cost model under realistic retry volume, and language fit.

A few patterns hold across the axes. Temporal earns its operational weight when workflows are long and regulated, the multi-region and SLA story is the reason large enterprises pick it, and the payload codec is mandatory homework rather than an edge case. Inngest wins on time-to-durability for TypeScript teams, but the step bill is the thing to model before you scale, not after. Restate is the sharpest pick when you want the journal-and-replay guarantee with the least operational surface and you value the exactly-once write semantics enough to adopt a younger platform.

Axis	Temporal	Inngest	Restate
Maturity	Most mature; Cadence lineage, large-scale battle-tested	Mature for event-driven serverless workloads	Newest; production-ready, smaller install base
Operational weight	Heavy: server + worker fleet (or Temporal Cloud)	Lightest: functions in your app, platform-invoked	Light: single self-contained binary
Multi-day workflows	First-class, 99.99% SLA (Multi-Region GA)	Supported via waitForEvent / sleep	First-class via awakeables
Large LLM payloads	History saturates; needs payload codec / claim-check	HTTP-invoked; mind payload size, offload large blobs	Journaled; offload large blobs as with Temporal
Exactly-once writes	App-level idempotency keys required	App-level idempotency keys required	Built-in exactly-once; less hand-rolled idempotency
Cost model	Self-host infra or Temporal Cloud usage	Per-step + per-run (balloons with retries)	Self-host infra (single binary)
Language fit	Go, Java, TS, Python, .NET	TS-first (Python, Go)	TS, Java/Kotlin, Go, Python, Rust
Best for	Long, regulated, multi-region sagas	Fast durability for existing TS apps	Temporal guarantees, lighter footprint, exactly-once

06 · Checkpointing, Resumption, and Idempotent Write Tools

The mechanics of "resume from a failed tool call without re-running side effects" come down to three disciplines that apply regardless of engine.

Wrap every side effect in a durable step. The LLM call, the tool invocation, the database write, the sleep, the wait-for-signal, each is a journaled boundary. Inside the step you do the work; the engine records the result. On replay, the recorded result returns and the body never runs again. The corollary is that anything outside a step must be deterministic, because it re-executes on every replay.

Make write tools idempotent. Because the engine's guarantee is at-least-once on the external system (the crash-mid-call window), a write tool can run twice. Pass a stable idempotency key derived from the workflow run ID and step ID, so the second call is a no-op: the payment processor sees a duplicate key and returns the original charge, the email sender deduplicates, the row insert is an upsert. Restate reduces how much of this you write by hand, but the principle is universal, design the tool so a replay is safe. This is the durable-layer twin of the tool-use correctness work you do at the agent layer.

Checkpoint state, not just steps. For agents that accumulate large working state (a plan, a scratchpad, retrieved context), keep the durable history lean by checkpointing big state to external storage and journaling the reference. This is the same claim-check pattern that keeps Temporal history under its cap, and it doubles as a clean recovery point: on resume, rehydrate state from the checkpoint rather than from a 40MB history blob.

Approval Gates as Suspended Workflows

The cleanest payoff of durable execution for agents is the approval gate. Instead of a polling loop or a fragile in-memory wait, the workflow suspends at the gate with zero running compute and persists its full state. When the human acts in your UI, you send a signal (Temporal), an event (Inngest waitForEvent), or complete an awakeable (Restate), and the workflow wakes on the exact line where it paused. The wait is free, the resume is exact, and you get an automatic audit trail of who approved what and when. The durable suspend also makes the front end honest: instead of a spinner that lies about progress, you can surface the agent's real paused state with the UI patterns for long-running AI tasks that keep users oriented across a multi-day wait. This is the durable backbone under the fallback and escalation patterns that route low-confidence agent actions to a human, and it composes naturally with multi-agent orchestration where a supervisor agent waits on the durable completion of its sub-agents.

07 · Recommendation by Scenario

We close every durable execution scoping conversation with a concrete recommendation. They are imperfect, but they hold up most often:

Long, regulated, multi-region agent sagas (days to weeks, compliance, SLA). Temporal. The maturity, the 99.99% multi-region SLA, and the multi-language SDK earn the operational weight. Budget the payload codec work up front, it is not optional for document-heavy agents.

Existing TypeScript app, bursty event-driven agent jobs, want durability fast. Inngest. The step.run() ergonomics are the quickest path from a working agent to a durable one. Model your real per-run step count including retries before you scale, the step bill is the surprise.

Want Temporal-grade guarantees with the least operational surface and exactly-once writes. Restate. The single-binary footprint and built-in exactly-once semantics are the differentiators; accept a younger ecosystem in exchange.

Agents that take consequential, hard-to-reverse actions. Any of the three, but lead with the approval gate. Durable suspension at a human-in-the-loop boundary is the highest-leverage reason to adopt durable execution at all.

Still on a job queue with top-level retries. Stop. A queue is not a workflow engine. The first crash mid-run that double-charges a customer or re-corrupts a repo will cost more than the migration.

Durability is what separates an agent demo from an agent in production. The model can be perfect and the agent will still feel unreliable if a worker restart wipes forty minutes of state and a retry re-runs every paid step. The broader question of when this reliability work pays off, and how to measure it, sits in our agent reliability versus accuracy guide and the wider AI Agents pillar. At Particula Tech we treat durable execution as a default for any agent that takes real-world actions, because the alternative is an agent that works in the demo and loses its mind the first time the process dies.

Pick the engine that matches your workflow duration, payload size, cost model, and language. Wrap every side effect in a durable step, make every write idempotent, and offload large payloads before history saturates. Then your agent survives the crash that was always coming.

08 · FAQ

Quick answers to the questions this post tends to raise.

BLOG/AI AGENTS

Durable Execution for Agents: Temporal vs Inngest vs Restate

Durable execution keeps agents alive through crashes, retries, and approval gates. Temporal vs Inngest vs Restate on payload limits, pricing, and determinism.

Sebastian MondragonMAY 25, 2026 · 12 MIN READ

01 · Why Agents Need Durable Execution (and Queues Don't Cut It)

Three properties of agents break the naive queue-plus-retry pattern:

The Core Model: Record Once, Replay on Recovery

02 · Temporal: The Mature Default With a Payload Trap

The LLM Payload Saturation Trap

03 · Inngest: Fastest Path From App Code, Watch the Step Bill

TYPESCRIPT

export const researchAgent = inngest.createFunction(
  { id: "research-agent" },
  { event: "agent/research.requested" },
  async ({ event, step }) => {
    const plan = await step.run("plan", () => llm.plan(event.data.task));

    const results = [];
    for (const subtask of plan.subtasks) {
      // each step.run result is journaled; a crash replays it, never re-calls
      const r = await step.run(`tool-${subtask.id}`, () => runTool(subtask));
      results.push(r);
    }

    // suspend with zero compute until a human approves
    await step.waitForEvent("approval", {
      event: "agent/approved",
      timeout: "3d",
      match: "data.runId",
    });

    return step.run("finalize", () => llm.synthesize(results));
  }
);

The Step-Based Pricing Trap

04 · Restate: Same Guarantees, Lighter Footprint, Exactly-Once Writes

05 · Side-by-Side: The Decision Axes That Matter

Axis	Temporal	Inngest	Restate
Maturity	Most mature; Cadence lineage, large-scale battle-tested	Mature for event-driven serverless workloads	Newest; production-ready, smaller install base
Operational weight	Heavy: server + worker fleet (or Temporal Cloud)	Lightest: functions in your app, platform-invoked	Light: single self-contained binary
Multi-day workflows	First-class, 99.99% SLA (Multi-Region GA)	Supported via waitForEvent / sleep	First-class via awakeables
Large LLM payloads	History saturates; needs payload codec / claim-check	HTTP-invoked; mind payload size, offload large blobs	Journaled; offload large blobs as with Temporal
Exactly-once writes	App-level idempotency keys required	App-level idempotency keys required	Built-in exactly-once; less hand-rolled idempotency
Cost model	Self-host infra or Temporal Cloud usage	Per-step + per-run (balloons with retries)	Self-host infra (single binary)
Language fit	Go, Java, TS, Python, .NET	TS-first (Python, Go)	TS, Java/Kotlin, Go, Python, Rust
Best for	Long, regulated, multi-region sagas	Fast durability for existing TS apps	Temporal guarantees, lighter footprint, exactly-once

06 · Checkpointing, Resumption, and Idempotent Write Tools

The mechanics of "resume from a failed tool call without re-running side effects" come down to three disciplines that apply regardless of engine.

Approval Gates as Suspended Workflows

07 · Recommendation by Scenario

We close every durable execution scoping conversation with a concrete recommendation. They are imperfect, but they hold up most often:

08 · FAQ

Quick answers to the questions this post tends to raise.