Is Claude Fable 5 worth it over Opus 4.8?

Claude Fable 5 is worth the 2x cost only on a minority of tasks. It costs $10 input / $50 output per 1M tokens against Opus 4.8's $5/$25, and the capability gap is real: 80.3% vs 69.2% on SWE-Bench Pro, 95.0% vs 87.6% on SWE-Bench Verified. But for well-scoped feature work, debugging, PR review, and test generation, Opus 4.8 clears the bar at half the price. The economic case for Fable 5 lives in two places: tasks where Opus measurably plateaus (you can document the failures), and long-horizon agentic runs where an 11-point per-step accuracy gap compounds across dozens of steps. Default to Opus 4.8 and promote on evidence, not on the assumption that the newer model is always better value.

When should I use Claude Fable 5?

Use Claude Fable 5 when Opus 4.8 has demonstrably plateaued on a specific task, or for long-horizon agentic workloads where its accuracy advantage compounds. Concretely: multi-day autonomous runs, large-codebase migrations, and multi-file autonomous refactors where each step's failure probability multiplies across the chain. On SWE-Bench Pro, which is built to look like real multi-file engineering, Fable 5 hits 80.3% to Opus 4.8's 69.2%, an 11-point spread that matters far more when a run takes 40 steps than when it takes 3. For a single well-scoped function or a one-shot bug fix, the gap rarely justifies 2x the token cost. Promote to Fable 5 after you have measured Opus failing, not before.

How much does Claude Fable 5 cost compared to Opus 4.8?

Claude Fable 5 costs $10 per 1M input tokens and $50 per 1M output tokens, exactly double Opus 4.8's $5 input and $25 output. On the Batch API the rates halve to $5/$25 for Fable 5 and the same proportional discount applies to Opus. Fable 5 was free on Pro, Max, Team, and Enterprise plans through June 22, 2026, then switched to metered pricing. The per-token premium is the headline number, but it understates and overstates the real cost depending on the workload: on tasks where Opus retries two or three times before succeeding, Fable 5 finishing in one pass can close most of the gap. Measure retry rates before assuming the 2x is the true delta.

What is the benchmark difference between Fable 5 and Opus 4.8?

Claude Fable 5 leads Opus 4.8 by roughly 11 points on the hardest coding benchmark and 7 points on the standard one. On SWE-Bench Pro, designed to resemble real multi-file engineering work, Fable 5 scores 80.3% to Opus 4.8's 69.2%. On SWE-Bench Verified, Fable 5 hits 95.0% to Opus 4.8's 87.6%. For context, on SWE-Bench Pro both models clear the field: GPT-5.5 lands at 58.6% and Gemini 3.1 Pro at 54.2%. The 11-point Fable-to-Opus spread on SWE-Bench Pro is wider than the Opus-to-Gemini gap, which is the data point that justifies treating Fable 5 as a genuine step up rather than a marketing refresh, but only on the class of tasks SWE-Bench Pro represents.

Does Claude Fable 5 support zero data retention?

No. Claude Fable 5 requires 30-day data retention and is not available under zero data retention (ZDR). An organization configured for ZDR, or any retention below the 30-day requirement, gets a 400 invalid_request_error on every Fable 5 request, regardless of how well-formed the payload is. Opus 4.8 supports ZDR. This is a hard architectural constraint, not a tunable setting, and it is the single fastest way to disqualify Fable 5 for a regulated workload. If you operate under a data-handling regime that mandates zero retention, Opus 4.8 is your ceiling on the Claude tier and the benchmark conversation is moot. Check your org's retention configuration before you benchmark anything.

Should I upgrade my whole pipeline to Claude Fable 5?

No. Upgrading an entire pipeline to Claude Fable 5 is almost always the wrong move because most pipeline steps are well-scoped tasks where Opus 4.8 already clears the bar at half the cost. The right pattern is per-task routing: keep Opus 4.8 as the default and route only the steps that need Fable 5 to it. Across coding agents we have audited, the share of steps that genuinely benefit from the top tier is small, usually the long-horizon planning and multi-file refactor stages, not the read-grep-edit grind that dominates step count. Blanket-upgrading doubles your token bill while moving the success rate on most steps by zero. Promote individual steps on measured failure, not the whole pipeline on principle.

How does Fable 5's always-on thinking change how I use it?

Claude Fable 5 has thinking always on, which means longer turns and a different prompting style than Opus 4.8. Single requests on hard tasks can run many minutes, so you need to plan for timeouts, streaming, and progress UX rather than blocking on a synchronous call. The raw chain of thought is never returned; you get summarized thinking blocks if you opt in. Control depth with the effort parameter (low through max) rather than a thinking budget, since budget_tokens is removed. Prompts written for prior models are often too prescriptive and reduce Fable 5's output quality, so state the goal and constraints and let it plan. Treat it as a long-horizon worker you check on, not a chat endpoint you poll.

BLOG/LLMS & MODELS

Claude Fable 5 vs Opus 4.8: When to Use Which Model

Fable 5 hits 80.3% on SWE-Bench Pro vs Opus 4.8's 69.2% but costs 2x and lacks zero data retention. The task-routing rule for when the premium pays off.

Sebastian MondragonJUNE 09, 2026 · 9 MIN READ

Claude Fable 5 vs Opus 4.8: When to Use Which Model

Claude Fable 5 went generally available on June 9, 2026, the first publicly available model in Anthropic's Mythos class, shipped to every Claude plan and free through June 22. The headline benchmark is genuinely large: 80.3% on SWE-Bench Pro against Claude Opus 4.8's 69.2%. That is an 11-point spread, and on the same benchmark it is wider than the gap between Opus 4.8 and Gemini 3.1 Pro. So the upgrade question writes itself, and most teams will get the answer wrong.

The wrong answer is "Fable 5 is better, so use Fable 5." It costs $10 per 1M input tokens and $50 per 1M output, exactly double Opus 4.8's $5/$25, and it carries a 30-day data retention requirement that disqualifies it outright for some regulated workloads. The right answer is a routing decision, not a model decision: keep Opus 4.8 as the default and promote individual tasks to Fable 5 only when the evidence justifies the premium. This is a within-Anthropic version of the same cheap-first discipline we apply across providers, and the math is sharper here because the two models share an API surface, a tokenizer, and a 1M context window. The only things that differ are price, capability, and one hard retention constraint.

This post lays out the decision framework: where the 11-point gap actually compounds, what the 2x cost really means once you account for retries, why zero data retention is a gate and not a knob, and the task-routing promotion rule we use to decide which work goes to which tier.

What Shipped on June 9

Fable 5 is the first Mythos-class model Anthropic has released to the public, GA on Pro, Max, Team, and Enterprise. The API model ID is claude-fable-5. It was free across all plans through June 22, 2026, then switched to metered pricing. It ships with a 1M-token context window (the maximum is also the default) and 128K max output, the same envelope as Opus 4.8.

What changed under the hood matters for how you use it. Thinking is always on, so you cannot disable it; an explicit thinking: {type: "disabled"} returns a 400. You control reasoning depth with the effort parameter (low through max) instead of a token budget. The raw chain of thought is never returned, only summarized thinking blocks if you opt in. And single requests on hard tasks can run many minutes, which means Fable 5 is a long-horizon worker you check on asynchronously, not a chat endpoint you poll synchronously. None of this is true of Opus 4.8, which keeps the standard request surface.

The Benchmark Gap That Actually Matters

Two benchmarks tell the story, and they tell slightly different versions of it.

SWE-Bench Verified is the cleaner, more curated benchmark, and there the gap is 7 points. SWE-Bench Pro is the one designed to look like real multi-file engineering work, with messier context and longer task chains, and there the gap opens to 11 points. That widening is the signal. When a benchmark gets harder and more realistic, the Fable-to-Opus gap grows, which tells you the advantage lives specifically in the kind of complex, multi-step work that production agents actually do.

The cross-vendor context sharpens it further. On SWE-Bench Pro, Opus 4.8 at 69.2% already beats GPT-5.5 (58.6%) and Gemini 3.1 Pro (54.2%) by a comfortable margin. Fable 5's lead over Opus is larger than Opus's lead over Gemini. If you have read our breakdown of Opus versus GPT-5 Codex versus Gemini for production coding, this is the same hierarchy with a new top entry, not a reshuffle. The competitive picture among the also-rans is unchanged; Fable 5 just extended the ceiling.

But a benchmark number is a per-task average, and that is exactly why "Fable 5 is better" leads teams astray. An 11-point accuracy edge on a single function or a one-shot bug fix is worth very little when Opus already passes most of those. The edge becomes decisive only when failures compound, which is a property of the workload, not the model.

Benchmark	Fable 5	Opus 4.8	GPT-5.5	Gemini 3.1 Pro
SWE-Bench Pro	80.3%	69.2%	58.6%	54.2%
SWE-Bench Verified	95.0%	87.6%	n/a	n/a

The Cost Math: 2x Is the Sticker Price, Not the Bill

Fable 5 is 2x Opus 4.8 per token: $10/$50 versus $5/$25. On the Batch API both halve, so Fable 5 is $5/$25 and Opus is the proportional equivalent. The sticker price is unambiguous. The actual bill is not, because per-token price is not per-outcome cost.

Consider a complex task where Opus 4.8 succeeds on the third attempt. You paid for three full runs (three sets of input tokens, three sets of output tokens, plus whatever orchestration overhead each retry carries) to get one usable result. If Fable 5 lands the same task in a single pass, the per-outcome comparison is one Fable 5 run versus three Opus runs. At 2x per token but one-third the runs, Fable 5 can come out roughly even or cheaper on that specific task, before you even count the engineering time spent babysitting the retries.

This is the inverse of the usual cheap-first argument, and it does not contradict it. The discipline is the same one we lay out in routing cheap-first to cut API costs: measure the loaded cost per successful outcome, not the headline per-token rate. For the 80% of tasks where Opus succeeds first try, cheap-first means Opus, and the 2x premium on Fable 5 is pure waste. For the minority where Opus thrashes, the retry-adjusted math can flip. You cannot know which bucket a task lands in without instrumenting retry rates, which is the whole point: the routing decision is empirical, not architectural.

Model	Input $/1M	Output $/1M	Batch input	Batch output
Claude Fable 5	$10.00	$50.00	$5.00	$25.00
Claude Opus 4.8	$5.00	$25.00	$2.50	$12.50

Zero Data Retention: A Gate, Not a Knob

Before any of the benchmark or cost analysis applies, one constraint can settle the decision for you. Opus 4.8 supports zero data retention. Fable 5 does not. Fable 5 requires 30-day data retention, and an organization configured for ZDR (or any retention below the 30-day floor) gets a 400 invalid_request_error on every Fable 5 request, no matter how well-formed the payload.

This is not a setting you tune or a tier you upgrade into. It is a hard architectural property of the model. If your workload operates under a data-handling regime that mandates zero retention, healthcare under certain interpretations, some financial and government contexts, anything where customer contracts forbid retention, then Fable 5 is simply off the table and Opus 4.8 is your ceiling on the Claude tier. The benchmark gap is irrelevant because you cannot legally use the model that has it.

The practical takeaway: check your org's retention configuration first, before you benchmark anything. We have seen teams burn a sprint evaluating Fable 5 only to discover at integration time that their compliance posture forbids it. Run the retention check as gate zero in any Fable 5 evaluation. If it fails, the rest of this framework collapses to "use Opus 4.8," and that is a perfectly good answer.

The Task-Routing Decision Tree

Assuming retention is not a blocker, here is how to route work between the two tiers. The default is always Opus 4.8. Escalation to Fable 5 is the exception, justified by evidence.

The split tracks a single principle: short, well-scoped, verifiable tasks default to Opus because the 11-point gap rarely changes the outcome and never justifies 2x. Long-horizon, multi-step, autonomous tasks default to Fable because that is exactly where the gap compounds. The grind that dominates step count in most coding agents, read a file, grep for a symbol, make a scoped edit, run a test, is Opus territory. The planning and multi-file coordination stages are where Fable earns its premium.

This same "match the model to the job, not the job to the model" logic underpins our guide on when to use smaller models versus flagships. Fable 5 versus Opus 4.8 is the high-end mirror of that decision: the same routing discipline, applied one tier up. And it is why blanket-upgrading a pipeline to Fable 5 is almost always wrong. Most steps move zero on success rate and double on cost.

Task type	Default tier	Escalate to Fable 5 when
Well-scoped feature work	Opus 4.8	Opus produces wrong implementations you can document
Debugging a known issue	Opus 4.8	Opus repeatedly misdiagnoses across attempts
PR review	Opus 4.8	Review misses real bugs a second pass catches
Test generation	Opus 4.8	Generated tests are shallow or miss edge cases
Multi-day agentic runs	Fable 5	Default here; gap compounds across steps
Large-codebase migration	Fable 5	Default here; multi-file coordination matters
Multi-file autonomous refactor	Fable 5	Default here; per-step accuracy multiplies

Where the 11-Point Gap Compounds

The benchmark gap is per-task. Production agentic work is per-run, and a run is dozens or hundreds of tasks chained together, each one's output feeding the next. This is where a per-step accuracy difference stops being marginal and starts being decisive.

Take the SWE-Bench Pro numbers at face value as rough per-step success proxies: 80.3% for Fable 5, 69.2% for Opus 4.8. On a single step, the difference is 11 points, noticeable but survivable. Now chain ten dependent steps where each one must succeed for the run to complete. Naively, Fable 5's run-completion probability is 0.803 to the tenth, around 11%, while Opus 4.8's is 0.692 to the tenth, around 2.5%. The per-step gap of 11 points became a per-run gap of roughly 4x. Real agents have retry logic and recovery, so the actual numbers are friendlier than this toy calculation, but the direction is exactly right: small per-step advantages multiply across long chains.

This is the structural reason long-horizon work is Fable territory and short tasks are not. The same arithmetic explains why we keep returning to the gap between flagship and challenger models in agentic settings, as in our MiniMax M2.7 versus Opus coding benchmark comparison: a model that looks competitive on single-shot benchmarks can fall off a cliff on multi-day runs because the compounding is unforgiving. Fable 5's always-on thinking and minutes-long turns are built for precisely this regime, which is the other half of why it suits long runs and overkills short ones.

The Promotion Rule: Start Cheap, Promote on Evidence

The operational rule that ties this together is simple and it is the opposite of how most teams approach a model launch. Do not start on Fable 5 and look for savings. Start on Opus 4.8 and look for failures.

Concretely:

Default every task to Opus 4.8. It is half the cost, supports zero data retention, and clears the bar on the large majority of production work.

Instrument retry rates and failure modes per task type. You cannot route on intuition. Log how often each task type requires a retry, what the failure looks like, and what the loaded cost per successful outcome is.

Promote a task type to Fable 5 only when documented failures justify the 2x. A task where Opus retries three times before succeeding, or fails in a way a second pass cannot fix, is a promotion candidate. A task Opus handles first try is not, no matter how impressive Fable 5's benchmark looks.

Default long-horizon agentic workloads to Fable 5 from the start. Multi-day runs, large migrations, and autonomous multi-file refactors are the one category where the compounding math justifies promoting before you have failure evidence, because the cost of a failed multi-day run dwarfs the per-token premium.

This is the model-routing audit Particula Tech runs as a fixed-scope engagement: we instrument your actual workload, measure where Opus 4.8 plateaus and where it does not, and hand back a per-task routing policy with the retry-adjusted cost math attached, rather than a recommendation to upgrade everything. The deliverable is a routing table you can act on, not a benchmark chart you already have. For the broader strategy of matching models to workloads across the whole stack, our LLMs and models pillar collects the full set of comparisons and routing frameworks.

The uncomfortable truth about a launch as strong as Fable 5's is that the strength is the trap. An 11-point benchmark lead makes "just upgrade" feel obviously correct, and for a minority of your workload it is. For the majority, it doubles your bill to move nothing. Fable 5 is the best widely available coding model right now. That does not make it the right model for most of your tasks, and the discipline to tell the difference is worth more than the model itself.

FAQ

Quick answers to the questions this post tends to raise.

BLOG/LLMS & MODELS

Claude Fable 5 vs Opus 4.8: When to Use Which Model

Fable 5 hits 80.3% on SWE-Bench Pro vs Opus 4.8's 69.2% but costs 2x and lacks zero data retention. The task-routing rule for when the premium pays off.

Sebastian MondragonJUNE 09, 2026 · 9 MIN READ

What Shipped on June 9

The Benchmark Gap That Actually Matters

Two benchmarks tell the story, and they tell slightly different versions of it.

Benchmark	Fable 5	Opus 4.8	GPT-5.5	Gemini 3.1 Pro
SWE-Bench Pro	80.3%	69.2%	58.6%	54.2%
SWE-Bench Verified	95.0%	87.6%	n/a	n/a

The Cost Math: 2x Is the Sticker Price, Not the Bill

Model	Input $/1M	Output $/1M	Batch input	Batch output
Claude Fable 5	$10.00	$50.00	$5.00	$25.00
Claude Opus 4.8	$5.00	$25.00	$2.50	$12.50

Zero Data Retention: A Gate, Not a Knob

The Task-Routing Decision Tree

Assuming retention is not a blocker, here is how to route work between the two tiers. The default is always Opus 4.8. Escalation to Fable 5 is the exception, justified by evidence.

Task type	Default tier	Escalate to Fable 5 when
Well-scoped feature work	Opus 4.8	Opus produces wrong implementations you can document
Debugging a known issue	Opus 4.8	Opus repeatedly misdiagnoses across attempts
PR review	Opus 4.8	Review misses real bugs a second pass catches
Test generation	Opus 4.8	Generated tests are shallow or miss edge cases
Multi-day agentic runs	Fable 5	Default here; gap compounds across steps
Large-codebase migration	Fable 5	Default here; multi-file coordination matters
Multi-file autonomous refactor	Fable 5	Default here; per-step accuracy multiplies

Where the 11-Point Gap Compounds

The Promotion Rule: Start Cheap, Promote on Evidence

Concretely:

Default every task to Opus 4.8. It is half the cost, supports zero data retention, and clears the bar on the large majority of production work.

FAQ

Quick answers to the questions this post tends to raise.