Open-weight models hallucinate package names at an average 21.7% rate versus 5.2% for commercial models, with CodeLlama topping 33% and GPT-4 Turbo lowest at 3.59%. A USENIX Security 2025 study of 2.23M code samples found 205,474 unique fabricated package names. In January 2026 the hallucinated npm package react-codeshift spread through 237 repositories with no human planting it. Defend with hash-pinned lockfiles, install-command verification, registry allowlists, and a CI gate that fails any build introducing an unrecognized dependency.
A coding agent told a developer to run npm install react-codeshift. The package sounds real. It is a clean conflation of two tools that do exist, jscodeshift and react-codemod, fused into a name no maintainer ever published. In January 2026 that fabricated name propagated through 237 repositories. No human typed it as a typo, and no human deliberately planted it. Agents wrote it into skill files, those files got committed, and downstream agents reading them passed the same hallucinated dependency along. That is slopsquatting, and the package hallucination rates driving it are not edge cases. They are a measured, repeatable property of every model you ship code with.
Here is the number that should reframe how you treat agent-generated install commands: open-weight models hallucinate package names at an average rate of 21.7%, versus 5.2% for commercial models, per research summarized by the Cloud Security Alliance. The underlying USENIX Security 2025 study analyzed 2.23 million code samples and surfaced 205,474 unique fabricated package names. Every one of those names is an empty slot an attacker can register on npm or PyPI, wait for the next model to hallucinate, and use as a malware delivery channel. Slopsquatting is typosquatting's automated cousin: the model invents the name, the attacker squats it, and your build pulls the payload.
This post walks the data, the react-codeshift incident, and the cross-registry attack surface, then gives you a defensive playbook you can wire into CI this week. The thesis is blunt: you cannot prompt your way out of a 21.7% hallucination rate. You gate it.
What Slopsquatting Actually Is
The attack chain has four steps and no human error in the middle.
The key property that makes this work is repeatability. Hallucinated package names are not uniformly distributed noise. Models converge on the same plausible-but-fake names because they are generalizing from real naming conventions. react-codeshift is the perfect specimen: both halves are real, the composition is grammatical, and multiple models would independently suggest it. An attacker does not need to guess. They harvest the names models actually produce and register the high-frequency ones.
This is the same trust failure behind prompt injection, where the model cannot reliably separate instructions from data. Here the model cannot reliably separate packages that exist from packages that should plausibly exist. If you want the broader pattern, our breakdown of why prompt injection keeps winning at an 85% attack success rate covers the same root cause from the input side.
The Hallucination-Rate Data: Open vs Commercial
The Cloud Security Alliance research note quantifies the gap between model classes, and it is large enough to factor into your model-selection decision.
Read the spread carefully. The average open-weight model invents a dependency in better than one out of five code samples that reference packages. CodeLlama topped 33% in some configurations, meaning a third of its package suggestions in those setups pointed at nothing. The best commercial model measured, GPT-4 Turbo, still hallucinated 3.59% of the time. There is no model on this chart with a zero.
The operational implication is that model choice is a risk multiplier, not a control. If you self-host an open-weight coder to save on API costs, you are accepting roughly four times the squattable-name volume of a frontier API. That tradeoff can be correct, but only if you have downstream gating that catches the difference. Choosing GPT-4 Turbo over CodeLlama does not let you skip lockfiles. It just lowers the rate at which the lockfile gate has to fire.
This matters more as teams lean on cheaper local models for high-volume agent work. If you are weighing a self-hosted setup, the same cost-versus-control logic shows up in our look at running GLM-5.2 locally: the savings are real, and so is the additional verification burden you take on.
| Model class | Avg package hallucination rate | Notable data points |
|---|---|---|
| Open-source / open-weight | 21.7% | CodeLlama exceeded 33% in some configs |
| Commercial / frontier API | 5.2% | GPT-4 Turbo lowest at 3.59% |
The Scale: 2.23M Samples, 205K Fabricated Names
The USENIX Security 2025 study behind these rates is what makes slopsquatting a supply chain problem rather than a curiosity. The researchers ran 2.23 million code samples through analysis. Of those, 440,445 contained at least one hallucinated package. Across those samples sat 205,474 unique fabricated package names.
Sit with that last figure. Two hundred thousand distinct names, each a registrable slot. An attacker does not need to compromise a maintainer account or sneak a malicious commit past review. They open a registry account and claim empty namespace, legally and silently, then wait for models to send victims to it. The cost of mounting the attack is a registry signup. The yield is whatever fraction of those 205,474 names gets re-hallucinated by a developer who runs the install command without checking.
This inverts the usual supply chain economics. Classic attacks like dependency confusion or the postmark-mcp rug-pull we documented in our MCP supply chain audit work require either internal naming knowledge or earning trust through clean releases first. Slopsquatting needs neither. The model manufactures the trust by suggesting the name in an authoritative voice, and the developer's habit of copy-pasting install commands does the rest.
The react-codeshift Incident: Spread With No Planter
The January 2026 react-codeshift case is worth dwelling on because it shows the new propagation vector: agent-to-agent, not human-to-human.
The name fused jscodeshift (a real codemod toolkit) and react-codemod (a real set of React migration scripts) into react-codeshift, which never existed. Models suggested it because it is the name those two tools would have if someone merged them, and that plausibility is exactly what hallucination produces. The package then spread through 237 repositories via AI-generated agent skill files. An agent writing setup or migration instructions referenced react-codeshift in a skill or instruction file. That file was committed. Other agents, reading those files as authoritative context, repeated the same dependency into their own outputs and commits.
No human deliberately planted it. The fabricated name propagated as a self-reinforcing artifact in agent context. This is the part that should worry anyone running multi-agent or skill-file-driven workflows: a hallucination committed once becomes ground truth for every agent that reads it afterward. The error does not decay. It compounds, the same way AI-generated code churn and cloning quietly inflates technical debt. A fabricated dependency in a skill file is churn that an attacker can weaponize the moment they register the name.
In the react-codeshift case the slot had not been weaponized, which is why it was a near-miss rather than a breach. But 237 repositories were one attacker registration away from pulling whatever got shipped to that name.
Cross-Registry Attack Surface
There is a second-order problem that makes the namespace bigger than either registry alone. Research found that 8.7% of Python package names hallucinated by models actually exist in the npm registry.
That overlap opens a cross-registry attack surface. A name a model tends to invent in a Python context may already resolve to a real (and potentially attacker-controlled) package on npm. In polyglot repositories, where agents scaffold both Node and Python in the same project, an install command can cross ecosystems. An agent reasoning in Python conventions emits a name, the install command targets npm, and the name resolves to something the attacker registered there.
The defense is ecosystem scoping. Each project's agents should be constrained to one registry's allowlist, and any package name that does not match the expected ecosystem's real inventory should be rejected before install. Do not let a Python-flavored hallucination resolve against npm just because the repo happens to contain both. Treat the registry boundary as a security boundary, not an implementation detail.
Defensive Playbook: Lockfiles, Verification, Allowlists
You defend against a 21.7% hallucination rate with mechanism, not vigilance. Here is the layered playbook, ordered by leverage.
# Node: install only what the lockfile already pins and hashes npm ci # Python with uv: fail if anything is missing or unpinned uv sync --frozen
# Reject any package that does not already exist on the registry pkg="react-codeshift" if ! npm view "$pkg" version >/dev/null 2>&1; then echo "BLOCK: $pkg not found on npm. Possible hallucination/slopsquat." >&2 exit 1 fi
Hash-pinned lockfiles, non-negotiable
Commit and enforce lockfiles with integrity hashes: package-lock.json or pnpm-lock.yaml for Node, poetry.lock or uv.lock for Python. Configure installs to honor them strictly (npm ci rather than npm install, uv sync --frozen). A hash-pinned lockfile means only previously resolved, hashed packages install. A freshly registered squat that is not already in the lockfile cannot slip in through a routine build, because there is no hash for it.
Verify every install command before running it
When an agent emits an install command, verify the package against the real registry before execution. Any name that returns nothing is a hard stop, not a "maybe it is new" prompt. The same logic applies to PyPI with pip index versions <pkg> or a registry API call. Wire this into the agent's tool layer so the check runs before any install tool fires, not as a habit you hope a human remembers.
Registry allowlists and private mirrors
Maintain an allowlist of vetted packages or, better, a private mirror that proxies only approved upstream packages. Agents can then pull only from sources you have reviewed. An allowlist converts "anything the model can name" into "the finite set we have vetted," which is the difference between an open and a closed attack surface. This is the same containment principle behind hardening MCP servers for production: constrain what the agent can reach, do not trust what it asks for.
Defense layers at a glance
| Layer | Stops | Effort |
|---|---|---|
| Hash-pinned lockfile | Unresolved/new squats in routine builds | Low |
| Install-command verification | Hallucinated names at suggestion time | Low |
| Registry allowlist / mirror | Any package outside vetted set | Medium |
| CI dependency-diff gate | New deps reaching prod unreviewed | Medium |
Gating Agent-Generated Dependencies in CI
The layer most teams skip, and the one that catches what the others miss, is a CI gate that treats any new dependency as a reviewable event. The principle: a build that introduces a package nobody approved should fail, full stop.
Diff the dependency manifest on every pull request. If a new package appears, require explicit human sign-off (a label, a CODEOWNERS approval on the manifest, or a passing allowlist check) before the build goes green. This turns the 21.7% hallucination rate from a silent install-time risk into a blocked, visible decision.
# .github/workflows/dep-gate.yml (sketch)
name: dependency-gate
on: pull_request
jobs:
diff-deps:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with: { fetch-depth: 0 }
- name: Detect new dependencies
run: |
git diff origin/main -- package-lock.json poetry.lock uv.lock \
> deps.diff || true
if grep -E '^\+' deps.diff | grep -qiE '"name"|^\+[a-z0-9_-]+=='; then
echo "New dependency detected. Requires manual review." >&2
echo "Verify each new package exists and is intended." >&2
exit 1
fiThe point is not this exact script. It is the policy: agent-generated dependencies do not reach main without a human confirming each new package is real and intended. Pair the gate with software composition analysis so a package that is real but malicious also gets caught, the same dependency-grade discipline we apply across the AI security cluster. Lockfiles stop the unresolved squat, verification stops the obviously fake name, the allowlist stops the unvetted source, and the CI gate stops everything that slips past the first three from shipping silently.
This is exactly the kind of control surface Particula Tech builds into client pipelines: a fixed-scope AI supply-chain audit that lockfiles, allowlists, and CI-gates every dependency your models invent, so a 21.7% hallucination rate becomes a non-event instead of a breach waiting for an attacker to register the right name.
The Bottom Line
Slopsquatting is not a future threat that depends on attacker sophistication. The vulnerability already exists in measured form: a 21.7% open-model hallucination rate, 205,474 fabricated names from a single study, and a documented incident where react-codeshift reached 237 repositories with no human in the loop. The 8.7% Python-to-npm overlap widens the namespace further. Every one of those fabricated names is a registry signup away from being live malware.
You will not fix this by choosing a better model. GPT-4 Turbo's 3.59% is lower, not zero, and it still feeds the same attack across enough volume. You fix it with mechanism: hash-pinned lockfiles, install-command verification, a registry allowlist, and a CI gate that refuses to ship a dependency no human approved. Wire those four in and the hallucination rate stops mattering, because the fabricated name never makes it from the model's mouth to your production runtime.
Frequently Asked Questions
Quick answers to common questions about this topic
Slopsquatting is a software supply chain attack that exploits package names large language models hallucinate. When an LLM or coding agent suggests an install command, it sometimes references a package that does not exist. An attacker who watches for these fabricated names registers the empty slot on npm or PyPI and ships malware to it. The next developer whose model hallucinates the same name installs the attacker's payload. The term is a play on typosquatting, but no typo is involved: the model invents the name and the attacker squats it. Open-weight models hallucinate package names at an average rate of 21.7%, so the supply of squattable names is large and predictable.



