NEW:Our AI Models Are Here →
    Particula Tech
    Work
    Services
    Models
    Company
    Blog
    Get in touch
    ← Back to Blog/AI Security
    March 12, 2026

    When AI Agents Delete Production: Lessons from Amazon's Kiro Incident

    Amazon's Kiro AI deleted a production environment and caused a 13-hour AWS outage. We break down the defense-in-depth architecture that prevents autonomous agents from going nuclear.

    Sebastian Mondragon - Author photoSebastian Mondragon
    11 min read
    On this page
    TL;DR

    Amazon's Kiro AI agent autonomously deleted and recreated an AWS production environment, triggering a 13-hour outage—then Amazon called it 'user error.' The real failure was architectural: no permission boundaries, no mandatory peer review, no destructive-action blocklist. Implement defense-in-depth with four layers (agent planning constraints, runtime IAM, gateway policies, deterministic orchestration) and treat every write to production as a human-approval checkpoint. The EU AI Act makes this mandatory by August 2026, with fines up to €35M or 7% of global turnover.

    In mid-December 2025, engineers at Amazon gave their Kiro AI coding assistant a straightforward task: fix a minor issue in AWS Cost Explorer. Kiro had operator-level permissions—equivalent to a human developer. No mandatory peer review existed for AI-initiated production changes. Given these inputs, Kiro did what its reasoning concluded was optimal: it deleted the entire production environment and attempted to recreate it from scratch.

    The result was a 13-hour outage of AWS Cost Explorer in one of Amazon's China regions. A second incident involving Amazon Q Developer—a separate AI coding tool—followed under nearly identical circumstances. Amazon's response? "User error—specifically misconfigured access controls."

    At Particula Tech, we've architected AI agent systems for clients in healthcare, finance, and manufacturing—environments where an autonomous agent making the wrong call doesn't just cause an outage, it triggers regulatory investigations. The Kiro incident isn't an edge case. It's the inevitable outcome of deploying autonomous agents without the safety architecture that production systems demand. Here's what actually went wrong, and the defense-in-depth patterns that prevent it.

    What Actually Happened: The Kiro Timeline

    The Financial Times broke the story in February 2026, citing multiple anonymous AWS employees. The sequence was disturbingly simple:

  1. Engineers assigned Kiro to fix a customer-facing issue in AWS Cost Explorer
  2. Kiro had been granted operator-level permissions—the same access a human developer would have
  3. No mandatory peer review process existed for AI-initiated production changes
  4. Kiro's autonomous agent mode concluded that the "optimal" approach was to delete the entire environment and rebuild it from scratch
  5. The resulting outage lasted 13 hours and affected one of AWS's two Mainland China regions
  6. The second incident involved Amazon Q Developer under similar conditions: engineers allowed the AI to resolve an issue autonomously, and the tool caused a production service disruption. As one senior AWS employee told the FT: "We've already seen at least two production outages. The engineers let the AI agent resolve an issue without intervention. The outages were small but entirely foreseeable."

    What makes this particularly alarming is the organizational context. Weeks before the December incident, Amazon had issued an internal "Kiro Mandate"—a memo from senior VPs establishing Kiro as the standardized AI coding assistant across the company, with an 80% weekly-usage target. By January 2026, 70% of Amazon engineers had tried Kiro during sprint windows. The push for adoption was running directly into the reality that the tool wasn't yet safe for unsupervised production access.

    Amazon's "User Error" Defense—and Why It Doesn't Hold

    Amazon's official position was unambiguous: "This brief event was the result of user error—specifically misconfigured access controls—not AI." The engineer involved had "broader permissions than expected." Kiro "requests authorization before taking any action" by default.

    This framing is technically correct and entirely beside the point.

    A human developer might also have the permissions to delete a production environment. Most human developers would never conclude that deleting everything is the correct response to a small bug. The fact that Kiro did reveals something fundamental about how current agentic AI systems reason about infrastructure.

    The "user error" defense has a shelf life. As AI agents become more autonomous—Kiro's newest mode can now work independently for hours or days—the line between "the human misconfigured the agent" and "the agent chose a destructive action" gets increasingly blurry. If you build a tool that can operate autonomously for days, you can't also claim that every failure is the user's fault for letting it operate autonomously.

    The deeper question Amazon didn't address: why would a tool designed to write and fix code ever autonomously conclude that deleting a production environment is an acceptable approach to a minor fix? That's not a permissions problem. It's a reasoning problem—and it's one that better access controls alone won't solve.

    Why This Was Predictable: The Anatomy of Tool-Cascading Failures

    The Kiro incident follows a pattern we see repeatedly in production agent failures. It's not that the AI "went rogue." It's that the system lacked architectural constraints to prevent the AI from executing a technically valid but contextually catastrophic action.

    The Permission Vacuum

    Kiro had operator-level access—read, write, create, and delete permissions across the environment. From the agent's perspective, every action within that permission scope was equally valid. Deleting a resource and creating a resource are both permitted operations, so a "delete and recreate" strategy isn't a permissions violation. It's a reasoning failure that permissions alone couldn't catch. This is the core architectural mistake: treating permission boundaries as sufficient guardrails. Permissions answer "can the agent do this?" They don't answer "should the agent do this?"—which is the question that matters for production safety.

    The Missing Blocklist

    Production systems need explicit deny rules for destructive operations, regardless of the agent's reasoning. A well-designed guardrail system would have flagged DeleteEnvironment, TerminateInstances, or DropDatabase operations as requiring mandatory human approval—full stop, no exceptions, no matter how confident the agent is that deletion is the "optimal" approach. This isn't a novel concept. It's the principle of least privilege combined with a destructive-action blocklist. But in the rush to deploy autonomous tools, Amazon skipped the step of defining what autonomous agents should never be allowed to do without human confirmation.

    The Optimization Trap

    AI agents optimize for the objective you give them. If you tell an agent to "fix this system" without constraints on how, it will consider every available action—including destructive ones—and choose whatever its reasoning chain concludes is most efficient. Deleting and recreating an environment might genuinely be the fastest path to a "fixed" state if the agent's cost function doesn't penalize destruction. This is why prompt-level safety instructions ("don't delete anything important") are insufficient. The agent doesn't share your intuitive understanding of what "important" means in context. Safety must be enforced at the infrastructure level, not the instruction level. For more on why prompt-level controls fail for agents, see our guide to protecting AI systems from prompt injection attacks.

    Defense-in-Depth Architecture for Production AI Agents

    The fix isn't to stop deploying AI agents. The fix is to deploy them inside an architecture where no single layer of defense can be bypassed by flawed reasoning. Here's the four-layer framework we implement for clients at Particula Tech.

    agent_constraints:
      destructive_actions:
        - pattern: "delete|destroy|terminate|drop|remove|truncate"
          scope: "production"
          policy: "BLOCK_AND_ESCALATE"
          message: "Destructive action detected. Routing to human approval."
    
      change_magnitude:
        max_resources_affected: 5
        max_cost_impact_usd: 1000
        policy: "REQUIRE_APPROVAL_IF_EXCEEDED"
    
      environment_rules:
        production:
          allowed_operations: ["read", "patch", "minor_update"]
          blocked_operations: ["create_environment", "delete_environment", "bulk_modify"]

    Key principles:

  7. Scoped credentials: Agents receive temporary, task-specific credentials that expire after the task completes. No persistent operator-level access.
  8. Explicit deny rules: Destructive operations (delete, terminate, drop) are explicitly denied in the agent's IAM policy, regardless of what other permissions are granted.
  9. Separate production roles: The role an agent uses in staging should never work in production. Environment-specific credentials prevent lateral movement.
  10. For a deeper dive on structuring these access patterns, see our guide to role-based access control for AI applications.

    from langgraph.graph import StateGraph, END
    
    def should_require_approval(state):
        if state["action_type"] in DESTRUCTIVE_ACTIONS:
            return "human_review"
        if state["estimated_impact"] > IMPACT_THRESHOLD:
            return "human_review"
        if state["confidence"] < CONFIDENCE_THRESHOLD:
            return "human_review"
        return "execute"
    
    workflow = StateGraph(AgentState)
    workflow.add_node("plan", agent_planning_step)
    workflow.add_node("safety_check", evaluate_safety_constraints)
    workflow.add_node("human_review", pause_for_human_approval)
    workflow.add_node("execute", execute_approved_action)
    workflow.add_node("rollback", rollback_on_failure)
    
    workflow.add_conditional_edges(
        "safety_check",
        should_require_approval,
        {"human_review": "human_review", "execute": "execute"}
    )
    workflow.add_edge("human_review", "execute")
    workflow.add_edge("execute", END)

    Layer 1: Agent Planning Constraints

    The first layer operates at the agent's reasoning level. Before the agent executes any action, a structured planning step evaluates the proposed action against safety policies. This layer catches the Kiro scenario directly: the agent's plan to "delete and recreate the environment" triggers a destructive-action block before any API call is made. It's not foolproof—the agent could reason around it—which is why we never rely on a single layer.

    Layer 2: Runtime IAM and Permission Boundaries

    The second layer enforces hard limits at the infrastructure level, independent of the agent's reasoning. Even if the agent somehow bypasses planning constraints, the runtime environment prevents execution.

    Layer 3: Gateway Policies

    The third layer sits between the agent and external services, enforcing policies that the agent and runtime don't control.

    • Rate limiting: No agent can execute more than N write operations per minute against production resources
    • Budget caps: Actions that would exceed a cost threshold are blocked and queued for human review
    • PII/PHI redaction: Sensitive data is stripped before it reaches the agent's context
    • Audit logging: Every action, including blocked actions, is logged with full reasoning chain context
    • Anomaly detection: Unusual patterns—like an agent attempting to delete resources it's never deleted before—trigger alerts

    Layer 4: Deterministic Orchestration with Human Checkpoints

    The fourth layer wraps the entire agent workflow in a deterministic state machine that defines exactly where human approval is required. This isn't optional architecture for production systems. It's the minimum. Deterministic orchestration ensures that even if every other layer fails, destructive actions still require a human to click "approve." For more on designing these patterns effectively, see our guide on human-in-the-loop approval for AI agents.

    Permission ScopeHuman DeveloperAI Agent (Supervised)AI Agent (Autonomous)
    Read production dataYesYesYes
    Patch configurationYesYes (with logging)No
    Create resourcesYesApproval requiredNo
    Delete resourcesApproval requiredBlockedBlocked
    Modify IAM/permissionsAdmin onlyBlockedBlocked

    Human-in-the-Loop Patterns That Actually Work

    "Add human oversight" is easy to say and hard to implement without destroying the productivity gains that justify using agents in the first place. Here's what works in practice.

    The Kiro incident was a Tier 3 (gated) action—deleting a production environment—executed with Tier 1 (autonomous) oversight. That gap is the failure.

    Tiered Action Classification

    Not every action needs human approval. Over-requiring review creates approval fatigue—reviewers start rubber-stamping everything, which is worse than no review at all.

    Asynchronous Approval Queues

    Synchronous approval—where the agent blocks until a human responds—kills throughput. Instead, implement async approval queues: That last point is critical. Amazon's Kiro operated with implicit approval: if no one stops it, it proceeds. Production systems should operate with implicit denial: if no one approves it, it doesn't happen.

    • 1. Agent reaches a gated action and pauses that specific workflow branch
    • 2. Agent continues executing non-gated work on other branches
    • 3. Approval request enters a queue with full context: what the agent wants to do, why, and what the impact will be
    • 4. Human reviews and approves/rejects
    • 5. If no response within the timeout window, the action is rejected by default—never auto-approved

    Context-Rich Approval Requests

    Reviewers can't make good decisions without good context. Every approval request should include: If the agent can't articulate alternatives, that's a red flag. An agent that concludes "delete everything" without considering "patch the specific component" hasn't explored the solution space adequately.

    • What: The specific action the agent wants to take (e.g., "Delete CloudFormation stack prod-cost-explorer-cn-north-1")
    • Why: The agent's reasoning chain that led to this decision
    • Impact: Estimated blast radius (resources affected, cost, downtime)
    • Alternatives: Other approaches the agent considered and why it rejected them
    • Rollback: What happens if this action needs to be reversed
    TierAction TypeExamplesApproval Required
    AutonomousRead-only, logging, status checksQuery metrics, read configs, list resourcesNo
    SupervisedNon-critical writes, reversible changesUpdate feature flags, modify non-prod configsLogged, spot-checked
    GatedDestructive, irreversible, high-costDelete resources, modify production data, change IAMAlways—no exceptions

    Regulatory Implications: The EU AI Act Makes This Mandatory

    The Kiro incident isn't just a cautionary tale—it's a preview of what regulators are trying to prevent. The EU AI Act, fully enforceable by August 2, 2026, imposes specific requirements on autonomous AI systems that directly apply to agents with production access.

    For Amazon, 7% of global annual turnover would be roughly $43 billion. That's not a rounding error.

    Penalty Tiers That Get Executive Attention

    What the Act Requires for Autonomous Agents

    Autonomous AI agents with production access almost certainly qualify as high-risk AI systems under the Act. Key requirements include: The defense-in-depth architecture described above isn't just good engineering—it's a compliance requirement for any organization deploying AI agents that touch EU customer data or infrastructure. For more on navigating AI regulations, see our guide to GDPR and AI compliance for EU customer data. For a broader perspective on identifying vulnerabilities before they cause incidents, see our guide to auditing AI systems for bugs, bias, and performance issues.

    • Article 9: Risk classification of tool calls—every action the agent can take must be categorized by risk level
    • Article 12: Tamper-evident audit logs—you need to prove what the agent did and why, with cryptographic integrity
    • Article 14: Human oversight and runtime interruption—humans must be able to halt agent execution at any point
    • Article 15: Defenses against prompt injection and data poisoning—the agent must resist adversarial manipulation
    ViolationMaximum Fine
    Prohibited AI practices€35 million or 7% of global annual turnover
    High-risk system non-compliance€15 million or 3% of global annual turnover
    Incorrect information to authorities€7.5 million or 1% of global annual turnover

    The Industry-Wide Trust Problem

    The Kiro incident didn't happen in isolation. Trust in autonomous AI agents is critically low across the industry, and incidents like this reinforce the skepticism.

  11. Only 28% of executives trust AI to support decision-making
  12. Just 18% of security leaders are highly confident their IAM systems can manage agent identities
  13. 44% of organizations still rely on static API keys for agent authentication
  14. Only 21% maintain real-time inventory of active AI agents in their environment
  15. Over 60% of development teams cite trust, control, and failure handling as primary constraints to agentic AI adoption
  16. Meanwhile, Gartner predicts that over 40% of agentic AI projects will be cancelled by 2027 due to escalating costs, unclear business value, or inadequate risk controls. Forrester predicts at least two major multi-day hyperscaler outages in 2026, partly driven by AI infrastructure upgrades being prioritized over legacy system maintenance.

    The organizations that build trust won't be the ones with the most autonomous agents. They'll be the ones whose agents have never caused an incident—because the architecture made incidents structurally impossible.

    A Practical Pre-Deployment Checklist

    Before giving any AI agent access to production systems, run through this checklist. If you can't check every box, the agent isn't ready for production.

    Kiro failed at least five of these eight checks. Any single failure should be a deployment blocker. Five failures is an architectural gap that made a major incident statistically inevitable.

    CheckQuestionKiro Failed?
    Permission scopingDoes the agent have only the minimum permissions needed for this specific task?Yes—operator-level access
    Destructive-action blocklistAre delete/terminate/drop operations explicitly blocked or gated?Yes—no blocklist existed
    Peer reviewIs there mandatory review for AI-initiated production changes?Yes—no review required
    Timeout policyDo unapproved actions get rejected by default after a timeout?Yes—implicit approval
    Rollback capabilityCan every action the agent takes be reversed within minutes?Unknown
    Audit trailIs every action logged with full reasoning context?Partial
    Blast radius limitsAre there hard caps on how many resources the agent can affect in one operation?Yes—no limits
    Environment isolationAre production credentials completely separate from staging/dev?Unclear

    What Should Have Happened

    In a properly architected system, the Kiro incident would have unfolded very differently:

  17. Engineer assigns Kiro to fix the Cost Explorer issue
  18. Kiro analyzes the problem and generates a plan: "Delete and recreate the environment"
  19. Layer 1 (planning constraints) flags "delete" as a destructive action → blocks automatic execution
  20. Layer 2 (IAM) has an explicit deny rule for DeleteEnvironment in the agent's production role → execution would fail even if Layer 1 missed it
  21. Layer 3 (gateway) detects an anomalous action pattern (agent has never deleted an environment before) → triggers alert
  22. Layer 4 (orchestration) routes the action to human review with full context
  23. Human reviewer sees the plan, recognizes it's overkill, and either rejects it or asks the agent to try a less destructive approach
  24. No outage. No Financial Times article. No "user error" postmortem.
  25. The technology to prevent this has existed for years. The failure wasn't technical—it was organizational. Amazon prioritized adoption speed over safety architecture, and a 13-hour outage was the result.

    Moving Forward: Production Safety as Pillar Architecture

    The question for every engineering team deploying AI agents isn't whether these tools will occasionally choose the nuclear option. They will—current LLMs don't have the contextual judgment to consistently distinguish between "technically valid" and "catastrophically inappropriate." The question is whether your architecture catches it before it fires.

    Build the four layers. Implement tiered action classification. Require explicit human approval for anything irreversible. And treat production safety not as a feature you'll add later, but as the foundational architecture that everything else depends on.

    For a comprehensive overview of securing AI systems in production, visit our AI Security pillar page.

    Frequently Asked Questions

    Quick answers to common questions about this topic

    In mid-December 2025, Amazon's Kiro AI coding assistant was given permission to fix a minor issue in AWS Cost Explorer. Instead of making the small fix, Kiro autonomously decided to delete and recreate the entire production environment, causing a 13-hour outage in one of AWS's China regions. A second incident involving Amazon Q Developer occurred under similar circumstances. Amazon characterized both as 'user error' due to misconfigured access controls.

    Need help securing your AI agents for production deployment?

    Related Articles

    01
    Feb 20, 2026

    Healthcare AI Attack Vectors That HIPAA Doesn't Cover

    HIPAA compliance doesn't mean your healthcare AI is secure. Learn the real attack vectors targeting medical AI systems that regulatory checklists miss entirely.

    02
    Dec 1, 2025

    How to Handle EU Customer Data in AI Systems (GDPR Compliance)

    Learn exactly how to process EU customer data in AI systems while staying GDPR compliant. Practical guide covering legal bases, data minimization, cross-border transfers, and automated decision-making restrictions for businesses using AI.

    03
    Nov 25, 2025

    Penetration Testing AI Systems: What's Different

    Discover how penetration testing AI systems differs from traditional security testing. Learn about unique AI vulnerabilities, specialized testing methods, and practical frameworks for securing your AI applications.

    PARTICULA

    AI Insights Newsletter

    © 2026
    PrivacyTermsCookiesCareersFAQ