Why did MCP server security collapse in Q1 2026?

Three forces converged. MCP adoption exploded, 97M monthly SDK downloads and 13K+ public servers by March 2026, and most implementations treated the reference TypeScript/Python SDKs as production-ready when they were demo code. Security researchers caught up: 30 CVEs landed against MCP servers in the 60 days between January and February 2026, plus an Equixly scan found 82% of 2,614 public implementations vulnerable to path traversal and 492 running with zero authentication. The protocol itself is sound; the servers people ship on top of it routinely skip auth, sandboxing, and input validation.

Does every MCP server need OAuth 2.1?

Every remote MCP server does. The June 2025 spec update made OAuth 2.1 with PKCE the standard for Streamable HTTP transport, and the March 2026 spec revision added mandatory resource-indicator binding (RFC 8707) to prevent token confusion across servers. Stdio-transport servers running locally inherit host permissions and do not need separate auth, but anything reachable over the network must issue resource-bound tokens, enforce scoped consent per tool, and refuse tokens whose audience claim does not match the server's canonical URL. Skipping this is how you end up in the 492-servers-with-zero-auth statistic.

How do you sandbox MCP tool execution in production?

Three layers. At the process boundary, run every tool invocation in a rootless container with a read-only root filesystem, dropped Linux capabilities, and seccomp filters, Docker MCP Gateway and FastMCP both ship this pattern. At the network boundary, deny all egress by default and allowlist specific hostnames per tool; DNS exfiltration is the most common bypass. At the invocation boundary, enforce per-call CPU, memory, wall-clock, and token budgets with hard kill at the limit. A tool that needs more than 1 vCPU and 2 GB for a single call is either misbehaving or should be a background job, not an MCP tool.

What are the common OAuth token-binding mistakes in MCP servers?

The three we audit out of almost every deployment: (1) no audience validation: the server accepts any valid token from the identity provider regardless of whether it was issued for this server, which lets an attacker with a token for a different MCP server pivot into yours; (2) no PKCE enforcement on dynamic client registration, so a malicious client registers itself and replays intercepted authorization codes; (3) storing refresh tokens in the agent process memory without encryption at rest, meaning any tool that can read process memory or dump a heap exfiltrates long-lived credentials. Fix all three before you call an MCP server 'authenticated.'

How do I isolate tenants on a multi-tenant MCP server?

Treat each tenant as a separate trust boundary with its own container, its own credentials vault, and its own audit stream. Concretely: one gateway container per tenant with a 1 vCPU / 2 GB hard cap, secrets injected per invocation from a per-tenant vault path (never shared env vars), filesystem mounts scoped to tenant-specific subtrees, and audit logs partitioned by tenant ID at the sink so an exfiltration or subpoena touching one tenant never surfaces another's prompts. Shared-process multi-tenancy through claims-based authorization alone has repeatedly leaked data across tenants, across multiple 2026 deployments we've audited, the failure mode keeps surfacing.

What should an MCP audit log contain, and what must be redacted?

Log the invocation envelope in full: tenant ID, agent ID, tool name, parameter schema hash, timestamp, duration, exit code, resource cost, and a hash of the input. Redact the raw prompt body, tool parameter values, and any response content unless the tenant has explicitly opted in under a data-processing agreement, because prompts routinely contain PII, secrets, and source code. Keep a separate, tightly-scoped debug stream with the full payload for 72 hours, encrypted at rest and accessible only through break-glass approval. Streaming traces need tail-based sampling so a single runaway agent does not blow your log bill, 1-5% sampling plus 100% capture of errors and tool-call failures is the pattern we default to.

Is it safe to expose an MCP server directly to the public internet?

Not without a gateway. Even a hardened MCP server should sit behind an authenticating reverse proxy, Pomerium, Cloudflare Access, or an equivalent identity-aware proxy, that enforces SSO, device posture, and rate limits before a request ever reaches the MCP layer. Public exposure of the raw server is how the Equixly scan found 492 servers with zero auth, teams deployed reference code to a VPS and forgot the front door was open. The 12-item checklist in this post exists to catch that exact failure mode before deploy.

BLOG/AI SECURITY

MCP Server Security Hardening: Production Checklist (2026)

30 MCP CVEs hit in 60 days. 82% of 2,614 scanned servers shipped with path traversal and 492 ran zero auth. Here's the 12-item hardening checklist we gate against.

Sebastian MondragonAPRIL 21, 2026 · 10 MIN READ

MCP Server Security Hardening: Production Checklist (2026)

In early 2026, dozens of CVEs landed against MCP-based servers. An Equixly scan of public MCP implementations found 82% of 2,614 servers vulnerable to path traversal and 492 running with zero authentication. OWASP published its MCP Top 10 hardening guide in March. Docker shipped the MCP Gateway specifically to put a hardened shim in front of the reference servers teams had already deployed.

The pattern behind those numbers is consistent across the MCP deployments we've audited: dozens of servers stood up in a quarter, exposing tools for databases, GitHub, Jira, customer data warehouses, and the occasional shell-execution endpoint that was scaffolded as a proof-of-concept and never removed. None of them required authentication. Several end up reachable from the public internet through a Cloudflare tunnel somebody forgot to restrict.

The protocol is not the problem. The servers people ship are, and so are the ones they pull in: if you run servers you did not write, also read our guide to auditing third-party MCP servers for supply chain attacks, which covers rug-pulls, tool poisoning, and a concrete vetting checklist. This post is the 12-item checklist we now gate every MCP deployment against: with the OAuth, sandboxing, multi-tenancy, and audit logging patterns that close the gaps we see most often. If you read our MCP developer guide for the build side, read this for the "do not get on the next Equixly scan" side.

01 · Why MCP Security Hit a Wall in 60 Days

Three forces converged in Q1 2026, and they are worth naming because the fix for each is different.

Adoption outran hardening. MCP went from Anthropic's internal experiment to 97M monthly SDK downloads and 13K+ public servers between November 2024 and March 2026. The reference TypeScript and Python SDKs are excellent demo code, and teams shipped them unchanged into production. The default stdio examples run with host permissions. The default HTTP examples expose ports with no auth. The default filesystem tools accept raw paths.

Security researchers caught up. Equixly, Orca, Cyera, and Horizon3.ai all pointed scanners at public MCP endpoints in early 2026. Path traversal, SSRF, prompt-injection-to-RCE chains through LLM-as-oracle patterns (the same class that produced the n8n CVE-2026-21858 RCE chain and Semantic Kernel's CVE-2026-25592 sandbox escape), and unauthenticated tool execution filled the disclosure queue. A typical scan now finds multiple classes of issue per server.

The trust boundary is new. MCP servers sit between an LLM (producing attacker-influenced tool calls through prompt injection) and real systems (databases, file systems, shells). Traditional API gateways assume predictable clients. MCP clients are adversarial by construction because anything a user says can, through the right prompt, become a tool call. The defense problem is far from solved: the benchmark data on why 40% of agent protocols stay exploitable to injection is a reminder that no single control closes this gap. This is the threat model Microsoft's ZT4AI framework tries to formalize, and the one the LangChain CVE audit proved most teams still model incorrectly.

The implication: MCP security cannot be bolted on by the identity provider or the load balancer. It needs to live inside the server, at the invocation boundary, on every tool. The checklist below is how we enforce that.

02 · The 12-Item MCP Server Hardening Checklist

Every item is a gate. We refuse to promote an MCP server to production traffic if any of these are red.

Items 1-4 are identity. Items 5-9 are execution. Items 10-11 are operations. Item 12 is the thing that lets you sleep. Let me walk through the three cluster areas where most teams have the biggest gaps.

#	Control	Gate
1	Transport	Streamable HTTP only; SSE deprecated; stdio behind host auth
2	OAuth 2.1 + PKCE	Enforced on every endpoint; no bearer-only fallbacks
3	Resource-bound tokens (RFC 8707)	Audience validated; cross-server tokens rejected
4	Per-tool OAuth scopes	User consents per tool, not per server
5	Input validation on every tool parameter	Typed schemas + explicit allowlists for paths, URLs, commands
6	Capability scoping	Each tool declares FS/net/shell needs; defaults deny-all
7	Rootless sandbox per invocation	Read-only FS, dropped caps, seccomp, no-new-privileges
8	Network egress allowlist	DNS + hostnames per tool; deny by default
9	Per-invocation budget	CPU, memory, wall-clock, token limits with hard kill
10	Per-tenant isolation	Separate container, secrets, and audit stream per tenant
11	Structured audit logging with redaction	Envelope logged 100%; payloads sampled; PII scrubbed at sink
12	Kill switch + rate limits at the gateway	Tool-level and tenant-level circuit breakers

03 · OAuth 2.1 and the Three Token-Binding Mistakes

The June 2025 MCP spec formalized OAuth 2.1 with PKCE for Streamable HTTP transport. The March 2026 revision, following the token confusion issues surfaced in late 2025, added mandatory resource-indicator binding per RFC 8707. Most servers we audit get the base handshake right and the binding wrong.

PYTHON

# FastMCP, enforce audience claim on every request
from fastmcp import FastMCP
from fastmcp.auth import BearerAuth

mcp = FastMCP(
    "prod-db-server",
    auth=BearerAuth(
        jwks_uri="https://idp.example.com/.well-known/jwks.json",
        issuer="https://idp.example.com/",
        audience="https://mcp.prod.example.com/",  # REQUIRED, reject others
        algorithms=["RS256"],
    ),
)

Mistake 1: No audience validation

A JWT or opaque token from your identity provider is not proof it was minted for this server. Without audience validation, a token issued for an analytics MCP server works against your production database MCP server. This is the cross-server pivot. If audience is missing or set to a wildcard, you are one stolen token away from a lateral incident.

Mistake 2: PKCE not enforced on dynamic client registration

MCP's dynamic client registration is convenient for IDE plugins and desktop clients that spin up on demand. It is also a registration endpoint that, without PKCE enforcement and client attestation, will register any caller. A malicious client registers, intercepts an authorization code through a redirect-URI trick, and replays it. Refuse registration without a code_challenge_method of S256, and lock redirect URIs to an explicit allowlist per client class.

Mistake 3: Long-lived refresh tokens in process memory

Refresh tokens are durable credentials. They belong in an encrypted secrets backend (Vault, AWS Secrets Manager, a sealed KMS-backed cache), not in the agent process heap. Any tool that executes user-influenced code, and in MCP, all tools do, can dump memory through the right sequence. Rotate refresh tokens on every use, cap their lifetime at 24 hours for high-privilege scopes, and never log them. The deeper fix is to stop handing agents static, long-lived credentials at all and instead use short-lived scoped tokens instead of static keys for agent identity, so a leaked token expires before an attacker can pivot with it. For the broader client-facing pattern of how these tokens flow from IDE to server, the MCP vs API comparison covers the handshake sequence; this section is what the server side has to enforce.

04 · Tool Sandboxing: Capability Scoping and Docker MCP Gateway

The single highest-leverage hardening you can ship is a sandbox per tool invocation. The default MCP server runs every tool in the same process as the MCP router, which means a prompt-injection-triggered read_file("/etc/shadow") hits the same filesystem as your OAuth token cache. The principle is the same one behind every managed runtime that will sandbox agent-generated code with a default-deny capability layer: assume the workload is hostile and grant nothing it has not explicitly declared.

Docker MCP Gateway, FastMCP's Sandbox primitive, and the Pomerium MCP proxy all implement the same pattern: each invocation spawns a rootless container with the minimum capabilities the declared tool needs, runs the tool inside, and returns the result. The overhead is 40-80ms per call on warm runtimes, cheap insurance. For tools that execute agent-generated code (not just call a fixed handler), a shared-kernel container is the wrong threat model, reach for a microVM instead, as we walk through in our SmolVM vs Firecracker vs Docker comparison.

PYTHON

from fastmcp import FastMCP, Tool
from fastmcp.sandbox import DockerSandbox, Capabilities

mcp = FastMCP("file-ops")

@mcp.tool(
    name="read_project_doc",
    sandbox=DockerSandbox(
        image="mcp-file-ops:1.4.2",
        capabilities=Capabilities(
            filesystem=["/srv/projects/{tenant_id}/docs:ro"],  # read-only, tenant-scoped
            network=[],                                          # NO network
            shell=False,
            cpu="500m",
            memory="512Mi",
            timeout_seconds=10,
            token_budget=5000,
        ),
    ),
)
def read_project_doc(path: str, tenant_id: str) -> str:
    # path validation happens inside the sandbox; traversal attempts fail closed
    ...

The declare-then-deny pattern

Every tool in our production servers declares what it needs at registration time. The sandbox refuses anything not declared. Three things to notice. The filesystem mount is read-only and tenant-scoped, a traversal attempt lands on an empty filesystem, not /etc/passwd. Network is empty, which kills DNS exfiltration even if the tool gets hijacked. CPU, memory, wall-clock, and token budgets are hard limits with SIGKILL at the threshold, the runaway-agent-spends-$4K-in-90-minutes scenario becomes impossible.

Network egress: allowlist hostnames, not CIDRs

The second-most-common exfiltration vector after filesystem is DNS. An attacker who injects a prompt into an email-summarization flow makes the tool resolve exfil.attacker.com/stolen_data_base64, and unless you lock DNS resolution to an explicit hostname allowlist, the resolution itself leaks the payload before any HTTP request is made. Docker MCP Gateway's allowed-hosts config and Pomerium's egress policy both do this properly.

05 · Multi-Tenant Isolation: One Container Per Tenant, Always

Multi-tenant MCP servers are where we see the worst incidents. The tempting design, one shared server process, claims-based authorization inside each tool, keeps producing cross-tenant data leaks across the audits we run. The pattern that works:

One gateway container per tenant. Hard 1 vCPU / 2 GB cap is a reasonable starting point; scale up per workload. Separate containers mean a kernel-level exploit in one tenant cannot observe another's memory.

Per-tenant secrets paths. Never stuff every tenant's credentials into the same environment. Mount secrets from /vault/{tenant_id}/ at invocation time, unmount on exit.

Per-tenant audit streams. Partition logs at the sink: separate S3 prefix, separate Loki tenant, separate CloudWatch log group. One tenant's data request or subpoena must never require sifting another tenant's prompts.

Per-tenant rate limits and budgets. A noisy tenant cannot starve a quiet one. This is also your billing signal; you probably already read our per-tenant LLM cost attribution guide, and the same accounting machinery drops straight into MCP gateway metrics.

06 · Audit Logging That Survives Incident Response

When an incident hits at 3am, a tenant reports "an agent accessed data it shouldn't have", your logs either answer the question or they don't. Most MCP servers I audit don't.

Log the envelope, sample the payload

Log 100% of invocation envelopes: tenant ID, agent ID, tool name, parameter schema hash, timestamp, duration, exit code, token count, CPU/memory high-water mark. That is maybe 400 bytes per call and it is what incident response actually needs, "did tenant A's agent call the query_customer_data tool between 2:47 and 2:51?" Sample payloads separately. 1-5% tail-based sampling plus 100% capture of errors and tool-call failures gives you the debugging surface without a 10x storage bill. Keep the payload stream on a 72-hour retention with break-glass access.

Redact at the sink, not at the source

Redacting prompts at the application layer is brittle; you will miss cases. Put a redaction pass at the log sink, Vector, Fluent Bit, or a Loki pipeline, that scrubs known-pattern secrets (API keys, tokens, credit cards, SSNs) before the data lands in durable storage. If the tenant has opted in under a DPA to full-content capture for debugging, route their stream to a separate bucket and enforce retention separately.

Streaming traces need sampling that doesn't drop incidents

OpenTelemetry GenAI semantic conventions are stabilizing, and tail-based sampling via an OTel Collector is the pattern that works: buffer the full trace, keep it only if there was an error, a latency outlier, a tool-call failure, or if it was sampled into the 1-5% keep-rate. This preserves the "what went wrong" traces without paying for the "everything went fine" ones.

07 · What to Do Monday Morning

If you have MCP servers in production today, run this in order:

Scan for public exposure. curl -s -o /dev/null -w "%{http_code}" https://your-mcp-host/mcp from an unauthenticated client. Anything other than 401 is an incident.

Audit the OAuth config. Is audience set? Is PKCE S256 mandatory? Are refresh tokens in Vault or in process memory? Fix any no answers this week.

Inventory tool capabilities. Every tool needs a declared filesystem, network, and shell footprint. Any tool that says "shell: true" or "network: any" is a 2-hour meeting until it says otherwise or gets deprecated.

Add the kill switch. Tool-level and tenant-level circuit breakers at the gateway. The first runaway agent incident will pay for the implementation time in one outage.

Check your audit stream. Pick a random tenant, an agent, a 10-minute window. Can you answer "what tools did they call?" If not, your incident response is already compromised.

For the broader context of where MCP fits in a 2026 AI security program, the AI security pillar collects the cluster's other posts, from prompt injection defense to zero-trust frameworks to the LangChain and n8n CVE post-mortems that all share the same root cause as the 30 MCP CVEs: AI frameworks trusting inputs that traditional threat models never contemplated.

08 · The Real Insight

MCP didn't get less safe in Q1 2026. It got more visible. The 30 CVEs, the Equixly scan, the OWASP guide, and Docker's gateway release are all signs of a protocol growing up. The servers that survive the next 60 days of scanning are not the ones running the latest reference SDK: they are the ones whose teams treated OAuth, sandboxing, isolation, and audit logging as non-negotiable gates before production.

A 7B model running behind a hardened MCP gateway is strictly safer than a flagship model fronted by a hobby Flask app. The model is not your security boundary. The gateway is. Once you are hardening more than one server, the next question is which control plane governs access across all of them, and our comparison of enterprise MCP gateways walks that RBAC, SOC 2, and tamper-evident audit decision. That said, the model layer still deserves its own controls, and if you want to compare AI guardrail frameworks for the model layer (NeMo vs Guardrails AI vs Llama Guard) they sit one layer above everything in this checklist. Gate accordingly.

09 · FAQ

Quick answers to the questions this post tends to raise.

BLOG/AI SECURITY

MCP Server Security Hardening: Production Checklist (2026)

30 MCP CVEs hit in 60 days. 82% of 2,614 scanned servers shipped with path traversal and 492 ran zero auth. Here's the 12-item hardening checklist we gate against.

Sebastian MondragonAPRIL 21, 2026 · 10 MIN READ

01 · Why MCP Security Hit a Wall in 60 Days

Three forces converged in Q1 2026, and they are worth naming because the fix for each is different.

02 · The 12-Item MCP Server Hardening Checklist

Every item is a gate. We refuse to promote an MCP server to production traffic if any of these are red.

#	Control	Gate
1	Transport	Streamable HTTP only; SSE deprecated; stdio behind host auth
2	OAuth 2.1 + PKCE	Enforced on every endpoint; no bearer-only fallbacks
3	Resource-bound tokens (RFC 8707)	Audience validated; cross-server tokens rejected
4	Per-tool OAuth scopes	User consents per tool, not per server
5	Input validation on every tool parameter	Typed schemas + explicit allowlists for paths, URLs, commands
6	Capability scoping	Each tool declares FS/net/shell needs; defaults deny-all
7	Rootless sandbox per invocation	Read-only FS, dropped caps, seccomp, no-new-privileges
8	Network egress allowlist	DNS + hostnames per tool; deny by default
9	Per-invocation budget	CPU, memory, wall-clock, token limits with hard kill
10	Per-tenant isolation	Separate container, secrets, and audit stream per tenant
11	Structured audit logging with redaction	Envelope logged 100%; payloads sampled; PII scrubbed at sink
12	Kill switch + rate limits at the gateway	Tool-level and tenant-level circuit breakers

03 · OAuth 2.1 and the Three Token-Binding Mistakes

PYTHON

# FastMCP, enforce audience claim on every request
from fastmcp import FastMCP
from fastmcp.auth import BearerAuth

mcp = FastMCP(
    "prod-db-server",
    auth=BearerAuth(
        jwks_uri="https://idp.example.com/.well-known/jwks.json",
        issuer="https://idp.example.com/",
        audience="https://mcp.prod.example.com/",  # REQUIRED, reject others
        algorithms=["RS256"],
    ),
)

Mistake 1: No audience validation

Mistake 2: PKCE not enforced on dynamic client registration

Mistake 3: Long-lived refresh tokens in process memory

04 · Tool Sandboxing: Capability Scoping and Docker MCP Gateway

PYTHON

from fastmcp import FastMCP, Tool
from fastmcp.sandbox import DockerSandbox, Capabilities

mcp = FastMCP("file-ops")

@mcp.tool(
    name="read_project_doc",
    sandbox=DockerSandbox(
        image="mcp-file-ops:1.4.2",
        capabilities=Capabilities(
            filesystem=["/srv/projects/{tenant_id}/docs:ro"],  # read-only, tenant-scoped
            network=[],                                          # NO network
            shell=False,
            cpu="500m",
            memory="512Mi",
            timeout_seconds=10,
            token_budget=5000,
        ),
    ),
)
def read_project_doc(path: str, tenant_id: str) -> str:
    # path validation happens inside the sandbox; traversal attempts fail closed
    ...

The declare-then-deny pattern

Network egress: allowlist hostnames, not CIDRs

05 · Multi-Tenant Isolation: One Container Per Tenant, Always

Per-tenant secrets paths. Never stuff every tenant's credentials into the same environment. Mount secrets from /vault/{tenant_id}/ at invocation time, unmount on exit.

06 · Audit Logging That Survives Incident Response

When an incident hits at 3am, a tenant reports "an agent accessed data it shouldn't have", your logs either answer the question or they don't. Most MCP servers I audit don't.

Log the envelope, sample the payload

Redact at the sink, not at the source

Streaming traces need sampling that doesn't drop incidents

07 · What to Do Monday Morning

If you have MCP servers in production today, run this in order:

Scan for public exposure. curl -s -o /dev/null -w "%{http_code}" https://your-mcp-host/mcp from an unauthenticated client. Anything other than 401 is an incident.

Audit the OAuth config. Is audience set? Is PKCE S256 mandatory? Are refresh tokens in Vault or in process memory? Fix any no answers this week.

Add the kill switch. Tool-level and tenant-level circuit breakers at the gateway. The first runaway agent incident will pay for the implementation time in one outage.

Check your audit stream. Pick a random tenant, an agent, a 10-minute window. Can you answer "what tools did they call?" If not, your incident response is already compromised.

08 · The Real Insight

09 · FAQ

Quick answers to the questions this post tends to raise.