May 1, 2026

AI Gateway Decision Framework: LiteLLM vs Portkey vs Kong in 2026

We watched a Series B SaaS burn $42K in a single week because nobody could attribute LLM spend to a customer. Here's the 4-criteria matrix and tier-by-tier breakdown — DIY, LiteLLM, Portkey, Kong — that actually scopes your AI gateway.

Sebastian Mondragon

12 min read

TL;DR

An AI gateway becomes mandatory when your monthly LLM spend crosses ~$30K, you onboard a second provider, or compliance asks for per-team cost attribution — pick the one that bites first. Below ~$10K/mo with a single provider, a 200-line DIY router is honest engineering. Between $10K and $50K/mo, LiteLLM is the open-source default — it covers 100+ providers, fallback, and budgets without adding a vendor relationship. Above $50K/mo or once governance, audit trails, and prompt-injection guardrails enter the room, Portkey or Truefoundry pay for themselves in observability. At $200K+/mo with on-prem and multi-cloud requirements, Kong AI Gateway and Apigee become the realistic choices because the org already speaks API gateway. The expensive mistake is skipping a tier — buying Kong at $20K/mo or shipping homegrown code at $100K/mo.

Last quarter we got pulled into a Series B SaaS that had just blown $42,000 of LLM spend in a single week. The CTO called it a "billing anomaly." It wasn't. A new feature had shipped a prompt that recursively summarized customer call transcripts, a junior dev forgot a length cap, and the loop ran unbounded against Opus 4.6 for nine days before anyone noticed. The fix took two hours. The diagnosis took two weeks — because nobody could answer the question "which customer did this come from?" without grepping CloudWatch logs.

That's the moment an AI gateway stops being optional. Direct provider SDKs are fine while you have one team, one provider, and a small bill. They become a liability the day a single misbehaving prompt can drain your monthly budget without anyone noticing for nine days. This post is the decision framework we use with consulting clients to scope the AI gateway tier — when you actually need one, which of the four tiers fits your scale, and how to migrate between them without a rewrite.

We'll walk a symptom checklist, the four-criteria matrix that scopes the decision, then tier-by-tier through DIY routers, LiteLLM, Portkey and Truefoundry, and Kong AI Gateway and Apigee. Each tier has a workload it's right for and one it isn't — the mistake we see most often is skipping straight to Kong because "we're enterprise" or staying on a homegrown router at $100K monthly spend because "it works fine." It does, until it doesn't.

When an Unmanaged LLM Stack Actually Needs a Gateway

The symptom checklist below is the one we run through on the first consulting call. If you check three or more, the gateway has moved from a nice-to-have into the critical path.

You can't answer "how much did Customer X cost us in LLM spend last month?" in under an hour. Not approximately — exactly. This is the per-tenant cost attribution problem, and it's the single most common trigger for a gateway purchase.

You have more than one LLM provider and you've started writing conditional code. if model.startswith("gpt"): client = openai_client else: client = anthropic_client is the smell. It's also the moment retries, fallback logic, and rate-limit handling start diverging across providers in ways that bite in production.

A single bad prompt has blown more than 5% of your monthly budget overnight. Even once. Without a gateway you have no per-key budget, no token-rate limiter, and no kill switch. With one, you set a hard ceiling and sleep.

Compliance has asked for an audit trail of LLM requests with the prompt content redacted. SDK-level logging doesn't give you that without significant custom work. Every gateway in this guide does.

You have a multi-tenant SaaS and your customers are asking for "BYO API key" or per-tenant routing. This is impossible to do cleanly without a gateway abstraction.

Your security team has flagged prompt injection or PII leakage as a P1 risk. Inline guardrails — content classifiers, regex-based PII scrubbing, semantic prompt-injection detectors — belong in the gateway, not scattered across application code.

Your monthly LLM spend is north of $30,000 and growing. At that scale, the lack of caching, routing, and budget controls is leaving 20-40% of spend on the table.

Hit any single bullet and you should be evaluating tiers. Hit three and you're already late. We've seen the consequences of "late" play out twice in the last six months — once as the $42K incident above, once as a regulatory finding that delayed a SOC 2 Type II by four months. Both were avoidable with a gateway in place 90 days earlier.

The 4-Criteria Matrix: Scoping Your Tier

Once you've decided you need a gateway, four criteria scope which tier fits. We score each on a 0-3 scale during the first consulting call, then sum to a recommendation.

A score of 0-3 means Tier 1 (DIY router) is honest engineering. 4-6 puts you in Tier 2 (LiteLLM). 7-9 is Tier 3 (Portkey, Truefoundry). 10-12 is Tier 4 (Kong AI Gateway, Apigee). The boundaries are deliberately fuzzy — a team scoring 5 with a hard SOC 2 deadline in 90 days should jump to Tier 3, not stay at Tier 2 because the average score says so.

The most common scoping mistake is over-weighting spend and under-weighting governance. We've worked with $80K/mo workloads where Tier 2 LiteLLM was the right answer because governance was a single SOC 2 Type I and one provider. We've also worked with $15K/mo workloads at regulated banks where Tier 4 Kong was correct from day one because the rest of the API estate already lived there.

Criterion	0 (Tier 1)	1 (Tier 2)	2 (Tier 3)	3 (Tier 4)
Monthly LLM spend	Under $10K	$10K-$50K	$50K-$200K	$200K+
Governance maturity	"We log to Datadog"	Per-team budgets	Audit trail + guardrails	SOC 2 / HIPAA / EU AI Act
Deployment posture	Single cloud, SaaS OK	SaaS OK, prefer self-host	Self-host required	On-prem or air-gapped
Provider count	One provider	Two providers	3+ providers, multi-cloud	Provider-agnostic, sovereignty asks

Tier 1: The DIY Router (Under $10K/mo, Single Provider)

The DIY router is a 150-300 line internal module that wraps your provider SDK with four things: retries with exponential backoff, request-level timeouts, a basic in-memory or Redis-backed cache for repeat prompts, and per-API-key budget tracking. We've shipped this version for half a dozen seed-stage clients in a week.

The case for staying here is real. Below $10K/mo with one provider, the operational overhead of a real gateway — Docker images to maintain, a database to back up, a UI to keep secure — outweighs the value. The router is just code in your repo. It deploys with your application. There's no second system to monitor.

What it doesn't give you: multi-provider fallback, semantic caching, prompt-injection guardrails, per-tenant attribution beyond what you build manually, or a dashboard anyone outside engineering can read. The day your finance team asks for the second of those, you've outgrown the tier.

The migration trigger is straightforward. The day you write your first if provider == "anthropic" branch, stop and put the work into adopting LiteLLM instead. We've seen too many teams sink three months into a homegrown multi-provider abstraction before discovering LiteLLM ships exactly that, plus 100 providers they hadn't planned for. For more on the build-vs-buy framing, see our guide on when to build vs buy AI infrastructure.

Tier 2: LiteLLM (the Open-Source Default at $10K-$50K/mo)

LiteLLM is the project we recommend most often, and the reason is simple: it ships the 80% of features any production AI workload needs without a vendor relationship. It runs as either a Python library (drop-in replacement for OpenAI's client) or a self-hosted proxy server (a single Docker image plus a Postgres database). At consulting scale, the proxy is what you want — application code talks OpenAI Chat Completions wire format to the proxy, and the proxy fans out to 100+ providers.

What you get out of the box:

Provider routing and fallback with per-route policies. claude-opus-4-6 → gpt-5.3 → gemini-3.1 if the first two error out.

Per-key budgets, rate limits, and TPM/RPM caps. Hard ceilings that survive a runaway loop.

Prompt caching with a configurable TTL, sitting in front of providers that support native caching like Anthropic.

Cost tracking and per-tenant attribution via key tagging. The team_id and customer_id fields on each key flow into the spend logs, which is exactly the data the $42K-incident client was missing.

A web UI for keys, budgets, and usage. Functional, not beautiful.

Native logging to Langfuse, Datadog, OpenTelemetry, S3, and a dozen others.

What it doesn't give you cleanly: semantic guardrails (you bolt those on), prompt-level traces with full request/response replay (you wire Langfuse — see our Helicone vs Langfuse vs LangSmith breakdown for the right pairing at this tier), and the polished observability surface that a managed product ships. None of that is fatal at $10K-$50K monthly spend. All of it starts to bite at $50K+.

The gotcha we see most often is teams running LiteLLM in library mode (in-process) for too long. The library is fine for prototyping but loses every benefit of a centralized gateway — no per-key budgets, no centralized logging, no rate-limiter coordination across replicas. Switch to the proxy server the moment you have more than one service hitting LLMs, and pin the deployment with Postgres and Redis from day one. For the routing patterns LiteLLM enables, our walkthrough of cheap-first model routing to reduce API costs maps cleanly onto LiteLLM's policy syntax.

The migration trigger out of LiteLLM is governance load. The day a security review asks for prompt-injection blocking on the request path, semantic caching with similarity thresholds, or A/B testing across model versions with feedback collection, you're in Tier 3 territory.

Tier 3: Portkey and Truefoundry (Managed, $50K-$200K/mo)

Portkey and Truefoundry are the managed alternatives to running LiteLLM yourself, and they justify the premium with three things LiteLLM doesn't ship at the same level: per-prompt observability, semantic guardrails, and continuous evaluation infrastructure.

Per-prompt observability. Every request has a trace with the full prompt, response, latency breakdown, cache hit/miss, fallback decisions, and cost attribution — searchable, with replay. When a customer reports a hallucination, you find the exact request in 30 seconds, not three hours grepping logs. This is the single biggest workflow improvement at scale. It's also why we tend to recommend the managed option once a workload crosses the point where engineers spend more than two hours per week on LLM debugging.

Semantic guardrails. Inline classifiers for prompt injection, jailbreak attempts, and PII patterns. Both Portkey and Truefoundry ship a library of guardrail policies you compose per route, plus a hook to plug in custom classifiers. LiteLLM has hooks for the same but ships fewer policies and expects you to bring your own classifier infrastructure.

Continuous evaluation. A/B testing across model versions, feedback collection from end users, and dashboards that surface quality drift the same way our AI production monitoring guide recommends. This matters most when you're routing requests across 3+ providers and need to know whether last week's switch to a cheaper model degraded answer quality.

The pricing typically lands at 1-3% of LLM spend, which sounds expensive in absolute terms and isn't. At $100K monthly spend, $1,000-$3,000/mo for the gateway tier replaces 0.25-0.5 FTE that would otherwise be operating LiteLLM, building dashboards, and writing custom guardrail code. The math favors the managed option above roughly $50K/mo and turns lopsided above $100K/mo.

The migration trigger out of Tier 3 is rare and almost always organizational rather than technical. Either compliance demands on-prem deployment with the same posture as your existing API estate, or your platform team has standardized on Kong or Apigee for everything else and the AI gateway is the odd shape on the org chart. That's when Tier 4 enters the conversation.

Tier 4: Kong AI Gateway and Apigee (Enterprise, On-Prem, $200K+/mo)

Kong AI Gateway and Apigee with AI add-ons are the right answer at enterprise scale for two reasons that have very little to do with LLM features specifically.

The first is organizational fit. If your platform team already runs Kong for non-AI traffic, the AI gateway is a plugin layer on top of an existing system the SRE and security teams have already certified, IAM-integrated, and run through audit. Adding LLM routing, token-based rate limiting, and prompt caching as Kong plugins is a one-to-two-week integration instead of a multi-quarter vendor onboarding. The same logic applies to Apigee inside Google Cloud shops. The AI gateway capabilities are roughly comparable between purpose-built and Kong/Apigee at this tier — what differs is the one-time onboarding cost and the long-term operational shape.

The second is deployment posture. On-prem, air-gapped, or sovereignty-constrained deployments are first-class for Kong and Apigee — both have spent a decade engineering for those environments. LiteLLM and Portkey can deploy on-prem, but you're swimming upstream on networking, secrets management, and audit integration. At a regulated bank or a defense contractor, the AI gateway is rarely the long pole; the audit and network posture around it are. Kong and Apigee make those non-issues by inheriting the existing posture.

The cost shape changes too. Kong Konnect Plus self-hosted starts around $20K/year and scales with traffic. Apigee with AI add-ons lands in five-to-six figures monthly at large enterprise scale. Below $200K monthly LLM spend, the math rarely works in favor of either — you'd be paying enterprise gateway pricing for SaaS-tier LLM volume.

The mistake at this tier is buying for the technology and not the org. We've seen a $40K/mo workload land on Kong AI Gateway because "we're a Kong shop" and the result was a six-month integration project for capabilities LiteLLM would have delivered in two weeks. We've also seen the inverse — a $400K/mo regulated workload on managed Portkey that the security team forced off after a year because the procurement and audit overhead was unsustainable. The right tier matches the org, not just the bill.

Migration Path Between Tiers (Without Rewriting Your App)

The migration cost between tiers is much lower than most teams expect, on one condition: you standardize on the OpenAI Chat Completions wire format as your internal contract from day one. Every tier in this guide — direct SDK, LiteLLM, Portkey, Truefoundry, Kong AI Gateway, Apigee — exposes an OpenAI-compatible endpoint. Migration is a base-URL swap and a credentials rotation. The application stays untouched.

What you do redo at each transition:

Caching keys. Cache key construction is gateway-specific. The semantic vs. exact-match decision and the TTL settings move with you, but the underlying cache layer changes shape.

Guardrail policies. LiteLLM hooks, Portkey policies, and Kong plugins all express similar logic in different DSLs. Plan a week to port them.

Observability dashboards. Trace formats and metric names differ. Your "cache hit rate by provider" panel needs to be rebuilt against the new schema.

Per-tenant attribution. The tag namespace usually carries over, but the join from team_id to billing requires a one-time SQL or pipeline change.

A reasonable migration timeline is two engineering weeks for Tier 1 → Tier 2, four to six weeks for Tier 2 → Tier 3 (most of which is rebuilding observability), and two to three months for Tier 3 → Tier 4 (most of which is org-side: audit, IAM, network posture). The application code path is barely touched in any of them.

The pattern we ship to clients on long consulting engagements is to build per-tenant attribution and the OpenAI-compatible internal contract first, regardless of tier. Both survive every migration. Our per-tenant LLM cost attribution guide for multi-tenant SaaS covers the data model that makes this portable, and the smart caching architecture walkthrough covers the cache layer pattern that survives the transition between tiers without a rewrite. Both are required reading before you commit to a vendor. The broader strategic context — when an AI gateway maps to a real ROI versus when it's a premature cost — sits in our AI for Business pillar.

Recommendation by Scenario

We close every gateway-scoping conversation with one of five concrete recommendations. They're imperfect — every workload has wrinkles — but they're the starting points we've been right about most often:

Seed-stage SaaS, single provider, under $10K/mo: Stay on direct SDK with a 200-line internal router. Revisit when you onboard the second provider or cross $10K/mo.

Series A, two providers, $10K-$50K/mo: Adopt LiteLLM as a self-hosted proxy. Wire Langfuse for observability, set per-key budgets per team, ship within two weeks.

Series B, multi-tenant, $50K-$150K/mo, governance load: Move to Portkey or Truefoundry managed. The 1-3% premium replaces the half-FTE you'd otherwise burn operating LiteLLM at scale.

Enterprise, regulated, on-prem, $200K+/mo: Kong AI Gateway or Apigee. Pick the one your platform team already runs. The AI gateway is the easy part; the audit and network posture aren't.

Anyone with a hard SOC 2, HIPAA, or EU AI Act deadline in under 90 days: Skip the tier you'd "naturally" be at and jump one up. The audit trail and guardrail features at the higher tier collapse the compliance timeline.

The wrong answer in every case is doing nothing because the decision feels too big. We've yet to meet a team that regretted introducing an AI gateway tier that fit their scale. We've met plenty that regretted skipping it for another quarter and finding out the hard way — usually via a billing anomaly, sometimes via a customer escalation, occasionally via a regulator.

Frequently Asked Questions

Quick answers to common questions about this topic

Direct SDK usage is fine while you have one provider, one team, and monthly spend under roughly $10,000. Below that threshold, a 200-line internal router that adds a retry, a timeout, and a per-key budget covers 90% of the value. The trigger to add a real gateway is the first time one of three things happens: you onboard a second provider and start writing per-provider conditional code, finance asks which customer or team consumed which dollars and you can't answer in under a day, or a single bad prompt drains 15% of your monthly budget overnight. The gateway pays for itself the moment any of those three becomes a recurring problem.

May 1, 2026

AI Gateway Decision Framework: LiteLLM vs Portkey vs Kong in 2026

Sebastian Mondragon

12 min read

TL;DR

When an Unmanaged LLM Stack Actually Needs a Gateway

The symptom checklist below is the one we run through on the first consulting call. If you check three or more, the gateway has moved from a nice-to-have into the critical path.

You have a multi-tenant SaaS and your customers are asking for "BYO API key" or per-tenant routing. This is impossible to do cleanly without a gateway abstraction.

Your monthly LLM spend is north of $30,000 and growing. At that scale, the lack of caching, routing, and budget controls is leaving 20-40% of spend on the table.

The 4-Criteria Matrix: Scoping Your Tier

Once you've decided you need a gateway, four criteria scope which tier fits. We score each on a 0-3 scale during the first consulting call, then sum to a recommendation.

Criterion	0 (Tier 1)	1 (Tier 2)	2 (Tier 3)	3 (Tier 4)
Monthly LLM spend	Under $10K	$10K-$50K	$50K-$200K	$200K+
Governance maturity	"We log to Datadog"	Per-team budgets	Audit trail + guardrails	SOC 2 / HIPAA / EU AI Act
Deployment posture	Single cloud, SaaS OK	SaaS OK, prefer self-host	Self-host required	On-prem or air-gapped
Provider count	One provider	Two providers	3+ providers, multi-cloud	Provider-agnostic, sovereignty asks

Tier 1: The DIY Router (Under $10K/mo, Single Provider)

Tier 2: LiteLLM (the Open-Source Default at $10K-$50K/mo)

What you get out of the box:

Provider routing and fallback with per-route policies. claude-opus-4-6 → gpt-5.3 → gemini-3.1 if the first two error out.

Per-key budgets, rate limits, and TPM/RPM caps. Hard ceilings that survive a runaway loop.

Prompt caching with a configurable TTL, sitting in front of providers that support native caching like Anthropic.

A web UI for keys, budgets, and usage. Functional, not beautiful.

Native logging to Langfuse, Datadog, OpenTelemetry, S3, and a dozen others.

Tier 3: Portkey and Truefoundry (Managed, $50K-$200K/mo)

Tier 4: Kong AI Gateway and Apigee (Enterprise, On-Prem, $200K+/mo)

Kong AI Gateway and Apigee with AI add-ons are the right answer at enterprise scale for two reasons that have very little to do with LLM features specifically.

Migration Path Between Tiers (Without Rewriting Your App)

What you do redo at each transition:

Caching keys. Cache key construction is gateway-specific. The semantic vs. exact-match decision and the TTL settings move with you, but the underlying cache layer changes shape.

Guardrail policies. LiteLLM hooks, Portkey policies, and Kong plugins all express similar logic in different DSLs. Plan a week to port them.

Observability dashboards. Trace formats and metric names differ. Your "cache hit rate by provider" panel needs to be rebuilt against the new schema.

Per-tenant attribution. The tag namespace usually carries over, but the join from team_id to billing requires a one-time SQL or pipeline change.

Recommendation by Scenario

Seed-stage SaaS, single provider, under $10K/mo: Stay on direct SDK with a 200-line internal router. Revisit when you onboard the second provider or cross $10K/mo.

Series A, two providers, $10K-$50K/mo: Adopt LiteLLM as a self-hosted proxy. Wire Langfuse for observability, set per-key budgets per team, ship within two weeks.

Series B, multi-tenant, $50K-$150K/mo, governance load: Move to Portkey or Truefoundry managed. The 1-3% premium replaces the half-FTE you'd otherwise burn operating LiteLLM at scale.

Enterprise, regulated, on-prem, $200K+/mo: Kong AI Gateway or Apigee. Pick the one your platform team already runs. The AI gateway is the easy part; the audit and network posture aren't.

Frequently Asked Questions

Quick answers to common questions about this topic

AI Gateway Decision Framework: LiteLLM vs Portkey vs Kong in 2026

When an Unmanaged LLM Stack Actually Needs a Gateway

The 4-Criteria Matrix: Scoping Your Tier

Tier 1: The DIY Router (Under $10K/mo, Single Provider)

Tier 2: LiteLLM (the Open-Source Default at $10K-$50K/mo)

Tier 3: Portkey and Truefoundry (Managed, $50K-$200K/mo)

Tier 4: Kong AI Gateway and Apigee (Enterprise, On-Prem, $200K+/mo)

Migration Path Between Tiers (Without Rewriting Your App)

Recommendation by Scenario

Frequently Asked Questions

Need a second opinion on your AI gateway tier and migration path? Particula Tech sizes the architecture against your real traffic and governance load.

Related Articles

Agent Washing: Why 95% of 'AI Agents' Are Just Expensive Chatbots

AI Training for Non-Technical Teams: Best Platforms for 2026

AI vs Rules vs Humans: How to Pick the Right Decision Layer

AI Gateway Decision Framework: LiteLLM vs Portkey vs Kong in 2026

When an Unmanaged LLM Stack Actually Needs a Gateway

The 4-Criteria Matrix: Scoping Your Tier

Tier 1: The DIY Router (Under $10K/mo, Single Provider)

Tier 2: LiteLLM (the Open-Source Default at $10K-$50K/mo)

Tier 3: Portkey and Truefoundry (Managed, $50K-$200K/mo)

Tier 4: Kong AI Gateway and Apigee (Enterprise, On-Prem, $200K+/mo)

Migration Path Between Tiers (Without Rewriting Your App)

Recommendation by Scenario

Frequently Asked Questions

Need a second opinion on your AI gateway tier and migration path? Particula Tech sizes the architecture against your real traffic and governance load.

Related Articles

Agent Washing: Why 95% of 'AI Agents' Are Just Expensive Chatbots

AI Training for Non-Technical Teams: Best Platforms for 2026

AI vs Rules vs Humans: How to Pick the Right Decision Layer