An AI gateway becomes mandatory when your monthly LLM spend crosses ~$30K, you onboard a second provider, or compliance asks for per-team cost attribution — pick the one that bites first. Below ~$10K/mo with a single provider, a 200-line DIY router is honest engineering. Between $10K and $50K/mo, LiteLLM is the open-source default — it covers 100+ providers, fallback, and budgets without adding a vendor relationship. Above $50K/mo or once governance, audit trails, and prompt-injection guardrails enter the room, Portkey or Truefoundry pay for themselves in observability. At $200K+/mo with on-prem and multi-cloud requirements, Kong AI Gateway and Apigee become the realistic choices because the org already speaks API gateway. The expensive mistake is skipping a tier — buying Kong at $20K/mo or shipping homegrown code at $100K/mo.
Last quarter we got pulled into a Series B SaaS that had just blown $42,000 of LLM spend in a single week. The CTO called it a "billing anomaly." It wasn't. A new feature had shipped a prompt that recursively summarized customer call transcripts, a junior dev forgot a length cap, and the loop ran unbounded against Opus 4.6 for nine days before anyone noticed. The fix took two hours. The diagnosis took two weeks — because nobody could answer the question "which customer did this come from?" without grepping CloudWatch logs.
That's the moment an AI gateway stops being optional. Direct provider SDKs are fine while you have one team, one provider, and a small bill. They become a liability the day a single misbehaving prompt can drain your monthly budget without anyone noticing for nine days. This post is the decision framework we use with consulting clients to scope the AI gateway tier — when you actually need one, which of the four tiers fits your scale, and how to migrate between them without a rewrite.
We'll walk a symptom checklist, the four-criteria matrix that scopes the decision, then tier-by-tier through DIY routers, LiteLLM, Portkey and Truefoundry, and Kong AI Gateway and Apigee. Each tier has a workload it's right for and one it isn't — the mistake we see most often is skipping straight to Kong because "we're enterprise" or staying on a homegrown router at $100K monthly spend because "it works fine." It does, until it doesn't.
When an Unmanaged LLM Stack Actually Needs a Gateway
The symptom checklist below is the one we run through on the first consulting call. If you check three or more, the gateway has moved from a nice-to-have into the critical path.
if model.startswith("gpt"): client = openai_client else: client = anthropic_client is the smell. It's also the moment retries, fallback logic, and rate-limit handling start diverging across providers in ways that bite in production.Hit any single bullet and you should be evaluating tiers. Hit three and you're already late. We've seen the consequences of "late" play out twice in the last six months — once as the $42K incident above, once as a regulatory finding that delayed a SOC 2 Type II by four months. Both were avoidable with a gateway in place 90 days earlier.
The 4-Criteria Matrix: Scoping Your Tier
Once you've decided you need a gateway, four criteria scope which tier fits. We score each on a 0-3 scale during the first consulting call, then sum to a recommendation.
A score of 0-3 means Tier 1 (DIY router) is honest engineering. 4-6 puts you in Tier 2 (LiteLLM). 7-9 is Tier 3 (Portkey, Truefoundry). 10-12 is Tier 4 (Kong AI Gateway, Apigee). The boundaries are deliberately fuzzy — a team scoring 5 with a hard SOC 2 deadline in 90 days should jump to Tier 3, not stay at Tier 2 because the average score says so.
The most common scoping mistake is over-weighting spend and under-weighting governance. We've worked with $80K/mo workloads where Tier 2 LiteLLM was the right answer because governance was a single SOC 2 Type I and one provider. We've also worked with $15K/mo workloads at regulated banks where Tier 4 Kong was correct from day one because the rest of the API estate already lived there.
| Criterion | 0 (Tier 1) | 1 (Tier 2) | 2 (Tier 3) | 3 (Tier 4) |
|---|---|---|---|---|
| Monthly LLM spend | Under $10K | $10K-$50K | $50K-$200K | $200K+ |
| Governance maturity | "We log to Datadog" | Per-team budgets | Audit trail + guardrails | SOC 2 / HIPAA / EU AI Act |
| Deployment posture | Single cloud, SaaS OK | SaaS OK, prefer self-host | Self-host required | On-prem or air-gapped |
| Provider count | One provider | Two providers | 3+ providers, multi-cloud | Provider-agnostic, sovereignty asks |
Tier 1: The DIY Router (Under $10K/mo, Single Provider)
The DIY router is a 150-300 line internal module that wraps your provider SDK with four things: retries with exponential backoff, request-level timeouts, a basic in-memory or Redis-backed cache for repeat prompts, and per-API-key budget tracking. We've shipped this version for half a dozen seed-stage clients in a week.
The case for staying here is real. Below $10K/mo with one provider, the operational overhead of a real gateway — Docker images to maintain, a database to back up, a UI to keep secure — outweighs the value. The router is just code in your repo. It deploys with your application. There's no second system to monitor.
What it doesn't give you: multi-provider fallback, semantic caching, prompt-injection guardrails, per-tenant attribution beyond what you build manually, or a dashboard anyone outside engineering can read. The day your finance team asks for the second of those, you've outgrown the tier.
The migration trigger is straightforward. The day you write your first if provider == "anthropic" branch, stop and put the work into adopting LiteLLM instead. We've seen too many teams sink three months into a homegrown multi-provider abstraction before discovering LiteLLM ships exactly that, plus 100 providers they hadn't planned for. For more on the build-vs-buy framing, see our guide on when to build vs buy AI infrastructure.
Tier 2: LiteLLM (the Open-Source Default at $10K-$50K/mo)
LiteLLM is the project we recommend most often, and the reason is simple: it ships the 80% of features any production AI workload needs without a vendor relationship. It runs as either a Python library (drop-in replacement for OpenAI's client) or a self-hosted proxy server (a single Docker image plus a Postgres database). At consulting scale, the proxy is what you want — application code talks OpenAI Chat Completions wire format to the proxy, and the proxy fans out to 100+ providers.
What you get out of the box:
claude-opus-4-6 → gpt-5.3 → gemini-3.1 if the first two error out.team_id and customer_id fields on each key flow into the spend logs, which is exactly the data the $42K-incident client was missing.What it doesn't give you cleanly: semantic guardrails (you bolt those on), prompt-level traces with full request/response replay (you wire Langfuse — see our Helicone vs Langfuse vs LangSmith breakdown for the right pairing at this tier), and the polished observability surface that a managed product ships. None of that is fatal at $10K-$50K monthly spend. All of it starts to bite at $50K+.
The gotcha we see most often is teams running LiteLLM in library mode (in-process) for too long. The library is fine for prototyping but loses every benefit of a centralized gateway — no per-key budgets, no centralized logging, no rate-limiter coordination across replicas. Switch to the proxy server the moment you have more than one service hitting LLMs, and pin the deployment with Postgres and Redis from day one. For the routing patterns LiteLLM enables, our walkthrough of cheap-first model routing to reduce API costs maps cleanly onto LiteLLM's policy syntax.
The migration trigger out of LiteLLM is governance load. The day a security review asks for prompt-injection blocking on the request path, semantic caching with similarity thresholds, or A/B testing across model versions with feedback collection, you're in Tier 3 territory.
Tier 3: Portkey and Truefoundry (Managed, $50K-$200K/mo)
Portkey and Truefoundry are the managed alternatives to running LiteLLM yourself, and they justify the premium with three things LiteLLM doesn't ship at the same level: per-prompt observability, semantic guardrails, and continuous evaluation infrastructure.
Per-prompt observability. Every request has a trace with the full prompt, response, latency breakdown, cache hit/miss, fallback decisions, and cost attribution — searchable, with replay. When a customer reports a hallucination, you find the exact request in 30 seconds, not three hours grepping logs. This is the single biggest workflow improvement at scale. It's also why we tend to recommend the managed option once a workload crosses the point where engineers spend more than two hours per week on LLM debugging.
Semantic guardrails. Inline classifiers for prompt injection, jailbreak attempts, and PII patterns. Both Portkey and Truefoundry ship a library of guardrail policies you compose per route, plus a hook to plug in custom classifiers. LiteLLM has hooks for the same but ships fewer policies and expects you to bring your own classifier infrastructure.
Continuous evaluation. A/B testing across model versions, feedback collection from end users, and dashboards that surface quality drift the same way our AI production monitoring guide recommends. This matters most when you're routing requests across 3+ providers and need to know whether last week's switch to a cheaper model degraded answer quality.
The pricing typically lands at 1-3% of LLM spend, which sounds expensive in absolute terms and isn't. At $100K monthly spend, $1,000-$3,000/mo for the gateway tier replaces 0.25-0.5 FTE that would otherwise be operating LiteLLM, building dashboards, and writing custom guardrail code. The math favors the managed option above roughly $50K/mo and turns lopsided above $100K/mo.
The migration trigger out of Tier 3 is rare and almost always organizational rather than technical. Either compliance demands on-prem deployment with the same posture as your existing API estate, or your platform team has standardized on Kong or Apigee for everything else and the AI gateway is the odd shape on the org chart. That's when Tier 4 enters the conversation.
Tier 4: Kong AI Gateway and Apigee (Enterprise, On-Prem, $200K+/mo)
Kong AI Gateway and Apigee with AI add-ons are the right answer at enterprise scale for two reasons that have very little to do with LLM features specifically.
The first is organizational fit. If your platform team already runs Kong for non-AI traffic, the AI gateway is a plugin layer on top of an existing system the SRE and security teams have already certified, IAM-integrated, and run through audit. Adding LLM routing, token-based rate limiting, and prompt caching as Kong plugins is a one-to-two-week integration instead of a multi-quarter vendor onboarding. The same logic applies to Apigee inside Google Cloud shops. The AI gateway capabilities are roughly comparable between purpose-built and Kong/Apigee at this tier — what differs is the one-time onboarding cost and the long-term operational shape.
The second is deployment posture. On-prem, air-gapped, or sovereignty-constrained deployments are first-class for Kong and Apigee — both have spent a decade engineering for those environments. LiteLLM and Portkey can deploy on-prem, but you're swimming upstream on networking, secrets management, and audit integration. At a regulated bank or a defense contractor, the AI gateway is rarely the long pole; the audit and network posture around it are. Kong and Apigee make those non-issues by inheriting the existing posture.
The cost shape changes too. Kong Konnect Plus self-hosted starts around $20K/year and scales with traffic. Apigee with AI add-ons lands in five-to-six figures monthly at large enterprise scale. Below $200K monthly LLM spend, the math rarely works in favor of either — you'd be paying enterprise gateway pricing for SaaS-tier LLM volume.
The mistake at this tier is buying for the technology and not the org. We've seen a $40K/mo workload land on Kong AI Gateway because "we're a Kong shop" and the result was a six-month integration project for capabilities LiteLLM would have delivered in two weeks. We've also seen the inverse — a $400K/mo regulated workload on managed Portkey that the security team forced off after a year because the procurement and audit overhead was unsustainable. The right tier matches the org, not just the bill.
Migration Path Between Tiers (Without Rewriting Your App)
The migration cost between tiers is much lower than most teams expect, on one condition: you standardize on the OpenAI Chat Completions wire format as your internal contract from day one. Every tier in this guide — direct SDK, LiteLLM, Portkey, Truefoundry, Kong AI Gateway, Apigee — exposes an OpenAI-compatible endpoint. Migration is a base-URL swap and a credentials rotation. The application stays untouched.
What you do redo at each transition:
team_id to billing requires a one-time SQL or pipeline change.A reasonable migration timeline is two engineering weeks for Tier 1 → Tier 2, four to six weeks for Tier 2 → Tier 3 (most of which is rebuilding observability), and two to three months for Tier 3 → Tier 4 (most of which is org-side: audit, IAM, network posture). The application code path is barely touched in any of them.
The pattern we ship to clients on long consulting engagements is to build per-tenant attribution and the OpenAI-compatible internal contract first, regardless of tier. Both survive every migration. Our per-tenant LLM cost attribution guide for multi-tenant SaaS covers the data model that makes this portable, and the smart caching architecture walkthrough covers the cache layer pattern that survives the transition between tiers without a rewrite. Both are required reading before you commit to a vendor. The broader strategic context — when an AI gateway maps to a real ROI versus when it's a premature cost — sits in our AI for Business pillar.
Recommendation by Scenario
We close every gateway-scoping conversation with one of five concrete recommendations. They're imperfect — every workload has wrinkles — but they're the starting points we've been right about most often:
The wrong answer in every case is doing nothing because the decision feels too big. We've yet to meet a team that regretted introducing an AI gateway tier that fit their scale. We've met plenty that regretted skipping it for another quarter and finding out the hard way — usually via a billing anomaly, sometimes via a customer escalation, occasionally via a regulator.
Frequently Asked Questions
Quick answers to common questions about this topic
Direct SDK usage is fine while you have one provider, one team, and monthly spend under roughly $10,000. Below that threshold, a 200-line internal router that adds a retry, a timeout, and a per-key budget covers 90% of the value. The trigger to add a real gateway is the first time one of three things happens: you onboard a second provider and start writing per-provider conditional code, finance asks which customer or team consumed which dollars and you can't answer in under a day, or a single bad prompt drains 15% of your monthly budget overnight. The gateway pays for itself the moment any of those three becomes a recurring problem.


