Is AI-generated code worse than human-written code?

On maintainability signals, the data says yes, on average and at current tooling defaults. GitClear's 211M-line study found code churn (lines reverted or rewritten within two weeks) rose 39% in AI-heavy projects, copy-pasted lines now exceed refactored lines for the first time, and the refactored share fell 60%. Separate analysis shows AI-authored pull requests carry 1.7x more issues (10.83 vs 6.45 per PR). The code often works on first run, which is exactly why it ships, but it duplicates instead of reusing, and it adds rather than restructures. The problem is not that models can't write good code; it is that the default loop optimizes for a passing test, not for a clean codebase.

What does the GitClear AI code churn study actually measure?

GitClear analyzed 211 million changed lines across repositories including Google, Microsoft, and Meta from 2020 to 2024, the window in which AI coding assistants went mainstream. Its headline metric is code churn: lines that get reverted or substantially rewritten within two weeks of being committed, a proxy for code that should not have been written that way. Churn rose from 3.1% in 2020 to 5.7% in 2024. The study also tracks copy-pasted versus moved/refactored lines and duplicate code blocks. It is correlational, not a controlled trial, so it shows the pattern co-occurring with AI adoption rather than proving causation, but the timing and the size of the effect are hard to dismiss.

How much does AI code duplication increase tech debt?

Measurably. In GitClear's data, copy-pasted lines rose from 8.3% (2021) to 12.3% (2024) and, for the first time, exceeded moved or refactored lines, while duplicate code blocks grew roughly 10x between 2022 and 2024. Industry estimates put the post-adoption tech-debt increase at 30-41%, and unmanaged AI code can push maintenance cost to about 4x by the second year. Duplication is the mechanism: every cloned block is a future change you now have to make in N places instead of one, and AI assistants clone aggressively because regenerating a similar block is cheaper for the model than locating and reusing the existing abstraction.

Does AI coding actually slow teams down?

It can, once you count the second-order cost. The first-order effect is real velocity: more lines, faster, on isolated tasks. The second-order effect is rework. The METR randomized study found experienced open-source developers were 19% slower with AI tools on their own mature repositories, despite expecting a speedup. GitClear's churn data is the codebase-level version of the same finding: a 39% rise in code that gets reverted or rewritten within two weeks is velocity you paid for and then threw away. The net effect depends entirely on whether you have review gates and refactoring discipline to absorb the extra output.

What is a reasonable code churn rate to target?

Pre-AI baselines sat around 3-3.5% (lines reverted or rewritten within two weeks), and the industry average climbed to 5.7% by 2024. Treat anything trending above 5% as a warning that code is shipping before it is ready, and investigate sustained churn above 7% as a process failure, not a developer one. Churn is best read as a relative trend on your own codebase rather than an absolute pass/fail number: a steady rise after introducing AI assistants is the signal that the default loop is generating throwaway code. Pair it with the duplication ratio so you catch both 'rewrote it twice' and 'cloned it five times.'

Should engineering leaders restrict AI coding assistants based on this data?

No, restriction is the wrong lever; instrumentation is the right one. The productivity gains on boilerplate, tests, and exploration are real, and banning the tools cedes them. The data argues for governance, not prohibition: measure churn, duplication, and the refactored-to-added ratio; require review on AI-heavy changes; and budget explicit refactoring time so the codebase keeps getting reshaped rather than only growing. The teams that lose are the ones treating AI output as free because it compiles. The teams that win treat it as draft code that still has to pass the same maintainability bar as anything else.

BLOG/AI DEVELOPMENT TOOLS

AI Writes 41% of Code: The Churn and Tech-Debt Data

Q: How do I prevent AI-generated tech debt without banning the tools?

Gate on the right metrics and keep refactoring in the loop. Add duplication and churn checks to CI (jscpd or SonarQube clone detection with a hard threshold), require a human reviewer on every AI-heavy PR, and treat a passing test as necessary but not sufficient. Track the refactored-to-added ratio per sprint; if it is collapsing toward zero, the team is accreting code instead of shaping it. Use the assistant for boilerplate and exploration, but require that reuse of existing abstractions is checked before a clone is accepted. The tools stay; the unmanaged defaults go.

AI now writes ~41% of new code, and GitClear's 211M-line study shows churn up 39%, cloning past refactoring, and 1.7x more issues per PR. The data, decoded.

Sebastian MondragonJUNE 05, 2026 · 10 MIN READ

AI Writes 41% of Code: The Churn and Tech-Debt Data

GitClear analyzed 211 million changed lines of code across repositories from Google, Microsoft, Meta, and others, spanning 2020 to 2024. Over that window, AI coding assistants went from novelty to default, and roughly 41% of new code is now AI-generated. The study tracked what happened to the codebases underneath. Code churn, the share of lines reverted or rewritten within two weeks of being committed, climbed from 3.1% to 5.7%. In AI-heavy projects it rose 39%. For the first time in the dataset's history, copy-pasted lines outnumbered moved or refactored lines.

That last sentence is the one to sit with. Refactoring, the act of reshaping existing code, used to outpace cloning. Now cloning wins. A codebase that copies more than it consolidates is a codebase quietly accreting duplication, and duplication is the raw material of tech debt. The velocity is real. So is the bill that arrives in year two.

This is the data analysis underneath every "AI made us 2x faster" claim. The speed shows up immediately and the maintenance cost shows up later, which is exactly why most teams justify adoption on velocity and only discover the churn and duplication metrics after the codebase has already absorbed them. Below is what the numbers actually say, why AI assistants produce this specific failure pattern, and the gates that let you keep the speed without the debt.

01 · The Velocity-vs-Maintenance Gap

Every AI coding rollout I have seen is sold on the same metric: lines shipped, tickets closed, time-to-first-PR. Those are first-order numbers, and they genuinely improve. The problem is that code quality is a second-order metric. It does not show up in this sprint's velocity chart; it shows up six months later as the rate at which changes get harder.

The METR randomized controlled trial put a number on the gap. Experienced open-source developers working on their own mature repositories were 19% slower when using AI tools, even though they expected a speedup and reported feeling faster. We unpacked that finding in detail in why AI coding tools make developers 19% slower. A 2026 revisit of the study held that slowdown in place rather than reversing it, despite coverage that read the update the other way. GitClear's churn data is the codebase-level mirror of the same effect: a 39% rise in code that gets reverted or rewritten within two weeks is, definitionally, velocity you paid for and then discarded.

The two studies measure different things, one measures developer time and one measures code lifespan, but they point at the same gap. The speed is front-loaded. The cost is back-loaded. And the metrics most teams watch are all front-loaded, which is how a velocity win and a maintainability loss can coexist on the same dashboard without anyone noticing until the codebase tells them.

02 · GitClear's 211M-Line Study: Churn Rose 39% in AI-Heavy Projects

GitClear's central metric is code churn: lines that are reverted or substantially rewritten within two weeks of being committed. The logic is that healthy code, code that was thought through, tends to stick. Code that gets ripped out almost immediately is a proxy for output that should not have shipped in that form. The two-week window is deliberately tight to catch genuine rework rather than normal long-term evolution.

Here is the trend across the study window:

The 3.1% to 5.7% climb is the industry average. Inside AI-heavy projects specifically, churn rose 39% relative to baseline. That is not a rounding error. It means roughly one in eighteen committed lines is being undone within a fortnight, up from one in thirty-two at the start of the window.

A fair objection: this is correlational. GitClear did not run a controlled experiment that withheld AI tools from one group, so the study shows churn rising alongside AI adoption, not AI adoption causing churn. Other things changed over 2020-2024 too. But the size of the effect, its concentration in AI-heavy projects, and the mechanism, which I will get to, make the alternative explanations strain. The honest framing is that this is a strong pattern, not a proof, and it lines up with everything else in the data.

Year	Code churn (reverted/rewritten <2 weeks)	Context
2020	3.1%	Pre-assistant baseline
2021	~3.5%	Early Copilot
2022	~4.1%	Assistant adoption accelerating
2023	~4.9%	ChatGPT-era coding mainstream
2024	5.7%	~41% of new code AI-generated

03 · Cloning Now Beats Refactoring for the First Time

The churn number is the headline, but the duplication numbers are the more structural finding. GitClear tracks copy-pasted lines against moved/refactored lines, the latter being the signature of someone reshaping code rather than bolting more on.

Two things happened at once. Copy-pasted lines rose from 8.3% in 2021 to 12.3% in 2024, and crucially crossed above moved/refactored lines, the first time in the dataset that cloning outpaced reshaping. Meanwhile duplicate code blocks grew roughly tenfold between 2022 and 2024.

Duplication is not a cosmetic problem. Every cloned block is a change you have committed to making in N places instead of one. The first time a bug lives in five copies of the same function, the cost of the original shortcut comes due, with interest. A codebase where cloning beats refactoring is a codebase whose change cost is compounding silently, and the compounding does not appear on any sprint board.

Metric	2021	2024	Change
Copy-pasted lines (share)	8.3%	12.3%	+48%
Refactored share of changes	25%	<10%	~60% decline
Duplicate code blocks	1x	~10x	2022-2024 growth

04 · The Refactoring Collapse

The refactored share of all changes falling from 25% to under 10% is, to me, the most worrying single number in the study. Refactoring is how a codebase stays shapeable. It is the act of paying down structure so the next feature is cheap. When that share collapses by 60%, the codebase stops getting reshaped and only grows.

The mechanism is straightforward once you think about how the model and the human interact. Refactoring requires understanding the existing structure well enough to improve it. An AI assistant prompted with "add X" has no incentive to reorganize what is already there, and the developer accepting the suggestion often has not read the surrounding code closely enough to refactor it either. The path of least resistance is to add, not to reshape. Multiply that across thousands of commits and the refactored share collapses exactly as the data shows.

This is the same failure pattern, at the codebase scale, that shows up at the agent scale on harder benchmarks. When a coding agent edits one file to make a test pass and never touches the consumers it just broke, it is optimizing for a green check, not a coherent change. We documented that dynamic in agent scaffolding beats model upgrades on SWE-Bench: the model is capable of the better change, but the default loop does not ask for it. Code churn and the refactoring collapse are the same incentive problem expressed in commit history rather than benchmark scores.

05 · The Quality Signal: 1.7x More Issues Per PR

The churn and duplication metrics describe what is happening to the codebase structurally. The pull-request data describes what reviewers actually catch. Analysis of AI-authored versus human-authored PRs found AI-authored pull requests carry 1.7x more issues: 10.83 issues per PR versus 6.45.

The compounding estimates follow from there. Industry analysis puts the post-adoption increase in tech debt at 30-41%, and projects that unmanaged AI code drives maintenance cost to roughly 4x by the second year. Those second-year numbers are projections, not measured outcomes from a five-year study, so treat them as directional rather than precise. But the direction is consistent across every dataset: more issues per PR, more duplication, more churn, less refactoring, and a maintenance curve that bends the wrong way over time.

The phrase that explains all of it is "it compiles." AI-generated code clears the bar that gets attention, a passing test or a clean run, while quietly failing the bars that do not, reuse, structure, and reviewability. Code that works on the first try is exactly the code that ships without scrutiny, which is why the issue count per PR is higher even though the code "works." One of the more dangerous defects hiding in that count is the hallucinated import: models invent package names that do not exist and attackers pre-register them, a supply-chain risk we measure in slopsquatting and AI package-hallucination rates.

PR source	Issues per PR	Relative
Human-authored	6.45	1.0x
AI-authored	10.83	1.7x

06 · The Year-Two Cost and Why It Hides

The reason this debt is dangerous is that it is invisible during the period when you are deciding whether AI coding is working. Year one is all upside on the metrics most teams watch. The lines ship, the tickets close, the velocity chart goes up and to the right. The churn, the duplication, and the suppressed refactoring are accumulating, but they do not yet hurt because the codebase is still small enough relative to the debt to absorb it.

Year two is when the maintenance multiplier, estimated at roughly 4x for unmanaged AI code, starts to bite. Now every change touches a duplicated block that has drifted out of sync. Now the refactoring that was deferred is a major undertaking instead of a routine cleanup. Now the 1.7x issue rate compounds across a larger surface. The velocity that justified adoption is eroding, and the cause is a year of decisions that looked free at the time.

This is the same shape as the cost runaways we see elsewhere in AI systems, where the meter spins quietly until the invoice lands. The discipline that prevents it is identical: instrument the leading indicator before it becomes a lagging cost. For teams already worried their existing systems have accumulated this kind of hidden debt, the diagnostic approach in how to audit your AI for bugs, bias, and performance issues applies directly to AI-generated code: you measure the debt before you can manage it. At Particula Tech, the code-quality gates we build for clients start exactly here, with churn and duplication baselines, because you cannot govern a number you are not watching.

07 · Policy Implication: Gate on Duplication and Churn, Not Just Test Pass

The fix is not to ban the tools. The productivity gains on boilerplate, scaffolding, tests, and exploration are real, and prohibition cedes them to competitors who keep the tools and add the guardrails. The fix is to change what you gate on. A passing test suite is necessary but not sufficient, because every metric in this study sits underneath a green build.

Here is the practical gate set, in priority order:

Duplication threshold in CI. Run clone detection (jscpd, or SonarQube's copy-paste detector) on every PR with a hard ceiling. This is the single highest-leverage check because duplication is the mechanism behind most of the downstream cost, and it is mechanically detectable.

Track the refactored-to-added ratio per sprint. If refactoring is collapsing toward zero, as the study found industry-wide, the team is accreting rather than shaping. Budget explicit refactoring time to keep the ratio off the floor.

Mandatory human review on AI-heavy PRs. The 1.7x issue rate is exactly what review exists to catch. Code that compiles on the first try is the code most likely to skip scrutiny, so flag AI-heavy changes for extra eyes, not fewer.

Churn as a trend, not a gate. Watch the two-week revert/rewrite rate on your own codebase. A sustained rise above your pre-AI baseline (typically around 3-3.5%) is the early signal that the default loop is generating throwaway code.

The deeper point is that these are process gates, not tool restrictions. The same teams that win with AI coding are the ones who picked their harness and their review discipline as deliberately as their model. That framing carries straight into tool selection itself: in Cursor vs Claude Code for 2026, the differentiator is rarely the underlying model and almost always the workflow and guardrails wrapped around it. The same is true here. The tools are not the variable that determines whether you accumulate debt. Your gates are.

For the broader picture of where AI coding tools fit in a production engineering stack, our AI development tools pillar covers the surrounding decisions: which agent, which sandbox, which observability layer, and how the pieces compose into a workflow that ships fast without shipping debt.

08 · What the Data Tells Engineering Leaders

Three takeaways from the numbers.

The speed is real and so is the debt; they are not in tension if you measure both. AI generates roughly 41% of new code and the velocity gain on isolated tasks is genuine. The churn (+39% in AI-heavy projects), the duplication (cloning now exceeds refactoring), and the issue rate (1.7x per PR) are equally genuine. A team that only watches velocity will conclude AI coding is an unambiguous win and discover the year-two maintenance curve too late.

Duplication is the metric to gate on first. Of everything in the GitClear study, duplication is the most mechanically detectable and the most directly tied to future cost. Cloned lines now exceed refactored lines for the first time, and duplicate blocks grew tenfold in two years. A hard duplication threshold in CI catches the failure mode at the moment it is cheapest to fix.

Govern, do not prohibit. The teams that lose treat AI output as free because it compiles. The teams that win treat it as draft code held to the same maintainability bar as anything else: review on AI-heavy PRs, an enforced clone threshold, and protected refactoring time so the refactored-to-added ratio stays off the floor. The data does not say stop using AI to write code. It says stop pretending the output is finished the moment the test goes green.

The 41% number is going up, not down. The question for 2026 is not whether AI writes your code; it already writes a plurality of it. The question is whether you are measuring what it costs you, or only what it saves you.

09 · FAQ

Quick answers to the questions this post tends to raise.

BLOG/AI DEVELOPMENT TOOLS

AI Writes 41% of Code: The Churn and Tech-Debt Data

AI now writes ~41% of new code, and GitClear's 211M-line study shows churn up 39%, cloning past refactoring, and 1.7x more issues per PR. The data, decoded.

Sebastian MondragonJUNE 05, 2026 · 10 MIN READ

01 · The Velocity-vs-Maintenance Gap

02 · GitClear's 211M-Line Study: Churn Rose 39% in AI-Heavy Projects

Here is the trend across the study window:

Year	Code churn (reverted/rewritten <2 weeks)	Context
2020	3.1%	Pre-assistant baseline
2021	~3.5%	Early Copilot
2022	~4.1%	Assistant adoption accelerating
2023	~4.9%	ChatGPT-era coding mainstream
2024	5.7%	~41% of new code AI-generated

03 · Cloning Now Beats Refactoring for the First Time

Metric	2021	2024	Change
Copy-pasted lines (share)	8.3%	12.3%	+48%
Refactored share of changes	25%	<10%	~60% decline
Duplicate code blocks	1x	~10x	2022-2024 growth

04 · The Refactoring Collapse

05 · The Quality Signal: 1.7x More Issues Per PR

PR source	Issues per PR	Relative
Human-authored	6.45	1.0x
AI-authored	10.83	1.7x

06 · The Year-Two Cost and Why It Hides

07 · Policy Implication: Gate on Duplication and Churn, Not Just Test Pass

Here is the practical gate set, in priority order:

08 · What the Data Tells Engineering Leaders

Three takeaways from the numbers.

09 · FAQ

Quick answers to the questions this post tends to raise.