The 2025 DORA report (thousands of developers, Google Cloud and Faros) found AI raised PRs per developer 98% but bugs per developer rose 54% (versus 9% the prior year), incidents per PR jumped 243%, and median PR review time climbed 441%, with 31% more PRs merging unreviewed. AI is an amplifier, not a transformer: a Faros dataset of 67,000 developers showed teams with existing rigor saw 50% fewer incidents while unprepared teams saw 2x more customer-facing incidents. The fix is delivery-system redesign (right-sized review, automated gates that scale with volume, incidents tracked per user not per PR), not faster code.
The 2025 DORA report is the largest look yet at what AI actually does to software delivery, and the numbers are not the clean productivity win the tooling vendors promised. Surveying thousands of developers, the report (published by Google Cloud with delivery data from Faros) found that AI raised pull requests per developer by 98%. It also found that bugs per developer rose 54%, incidents per PR rose 243%, and median PR review time rose 441%. Nearly a third more PRs (31%) now merge with no human review at all. The same adoption that nearly doubled output roughly tripled how often that output breaks in production.
This is the most important DORA finding since the metrics were first published, because it breaks the assumption underneath most AI coding rollouts: that faster code creation is the goal and everything else follows. It does not follow. The 2025 data shows code velocity racing ahead while every downstream quality gate (review, testing, release validation) stayed exactly as fast as it was when humans wrote everything by hand. Faros Research named the result acceleration whiplash, and it is the defining failure mode of AI-assisted development in its current form.
This post walks through the headline numbers, explains why they coexist with genuine wins in the same dataset, and lays out what the data actually tells you to change. The short version: AI is an amplifier, not a transformer. It multiplies whatever your delivery system already does. If your pipeline already leaks defects, AI makes it leak at machine speed. If your pipeline is rigorous, AI lets you safely convert throughput into shipped value. Both outcomes are in the data, and which one you get is a choice about delivery-system design, not about which model you license.
The Headline Numbers From DORA 2025
Start with the figures, because they reframe what AI adoption is doing. Most teams measure AI's value by throughput: more PRs, more commits, more lines shipped per engineer. By that measure, AI is an unqualified success. The problem is that throughput is an input metric, and the 2025 DORA data shows what happens to the outputs.
Read top to bottom, the table tells a single story. Throughput nearly doubled. The bug rate, which had been creeping up 9% year over year, jumped to a 54% rise, a six-fold acceleration in how fast defects accumulate. Incidents per PR more than tripled, meaning each unit of shipped work is now far likelier to cause a production problem. Review time, the human gate meant to catch those defects before merge, stretched by more than five times. And when review could not keep up, the system did what overloaded systems always do: it shed the load, merging 31% more PRs with no review at all.
None of these are model-quality problems. The model is producing more code, and some of that code is worse, but the dominant effect is structural. A gate built to process human-paced output got force-fed machine-paced volume, and it failed in the predictable way: queue blowup, then bypass, then the uncaught debt resurfacing downstream as production incidents.
| Metric | Change after AI adoption | What it measures |
|---|---|---|
| PRs per developer | +98% | Raw code-creation throughput |
| Bugs per developer | +54% (vs +9% prior year) | Defect generation rate |
| Incidents per PR | +243% | Production stability per unit shipped |
| Median PR review time | +441% | Downstream review-gate saturation |
| PRs merged with no review | +31% | Review-gate overflow / bypass rate |
What Acceleration Whiplash Actually Means
Acceleration whiplash is the gap between how fast you can now create code and how fast you can still safely ship it. AI compressed the create step dramatically while leaving every verify step at its old pace. The result is not faster delivery end to end. It is a relocated bottleneck and a relocated cost.
The relocation is the key insight. Before AI, defects were caught and paid for pre-merge: a reviewer flagged a problem, the author fixed it, the cost stayed inside the development loop where it is cheap. With review saturated and a third more PRs slipping through unreviewed, that same defect debt does not disappear. It moves downstream and gets paid for in production, where it is expensive, where it is a 243% rise in incidents per PR, and where the people paying are customers instead of reviewers.
This is why the throughput number is misleading on its own. A 98% increase in PRs looks like doubled productivity only if you ignore that a growing share of those PRs is unreviewed, defect-bearing, and incident-prone. The real productivity question is not how much code you created, it is how much value you shipped net of the rework and incident response that code generated. That distinction is exactly the developer productivity paradox we have written about before: perceived speed goes up while measured delivery often does not, because the time saved writing code gets spent debugging, reviewing, and reworking it.
The 31% No-Review Signal
The single most alarming number in the report is not the incident rate. It is the 31% increase in PRs merging with zero human review. That figure is the review system failing silently, and it deserves its own attention because it is both a cause and a symptom.
It is a symptom because review time rose 441%. When a reviewer's queue nearly doubles overnight and each item takes longer (AI PRs tend to be larger and use unfamiliar patterns), the rational individual response is to wave through anything that looks plausible. Multiply that across a team and you get a third more PRs merged unseen.
It is a cause because every unreviewed PR is a defect filter removed from the pipeline. Review was the gate specifically designed to catch the kind of subtle, context-dependent mistakes that automated tests miss. Disabling it on a third more PRs, precisely when the code is increasingly machine-generated and increasingly voluminous, is how the bug rate goes from a 9% annual creep to a 54% jump. The review system was built for human-paced output and is now being force-fed machine-paced volume. It did not adapt. It overflowed.
The wrong fix is to demand that humans review everything anyway, which just trades the incident spike for a delivery freeze. The right fix is to stop treating all PRs as equal review candidates, which we get to below.
AI as Amplifier, Not Transformer
Here is the finding that should change how every engineering leader reads this report: the 2025 dataset contains both large regressions and large improvements, in the same metrics, at the same time. AI did not move everyone in one direction. It amplified the direction each organization was already heading.
The Faros dataset of 67,000 developers makes this concrete. Organizations that already had strong delivery rigor (real review discipline, automated quality gates, mature testing) saw 50% fewer incidents after adopting AI. Organizations flooding a pipeline that already leaked defects saw roughly 2x more customer-facing incidents. Same tools, opposite outcomes, and the deciding variable was the maturity of the delivery system the AI was dropped into.
This is the difference between an amplifier and a transformer. A transformer changes the signal. An amplifier makes whatever signal you feed it louder. AI is the latter. If you feed it a disciplined delivery process, it makes that discipline more productive. If you feed it a broken one, it makes the breakage more frequent and more visible. The teams getting burned are not the ones using AI wrong at the keyboard. They are the ones who never built the downstream system that AI volume requires.
That amplification also shows up in the codebase itself, not just in incidents. A separate analysis found that AI-assisted development drives code churn and cloning, with a measurable share of recently committed code being rewritten or duplicated rather than reused. That churn is the same whiplash viewed from inside the repository: volume up, durable value not keeping pace.
| Organization profile | Incident outcome after AI | Why |
|---|---|---|
| Strong review + automated gates | 50% fewer incidents | Higher throughput safely converted to shipped value |
| Weak review + few automated gates | ~2x more incidents | Higher throughput floods an already-leaky pipeline |
Why the Classic Four DORA Metrics Miss This
The original DORA metrics (deployment frequency, lead time for changes, change failure rate, and time to restore service) were defined for a world where humans wrote the code. AI broke an assumption baked into them: that throughput is scarce and therefore a reasonable proxy for productivity. AI makes throughput cheap, and a cheap input is a bad proxy.
Deployment frequency is the clearest casualty. It rewards shipping often, and AI makes shipping often trivial, so a team can post record deployment frequency while quietly rewriting a growing share of what it just shipped. The metric looks elite. The real productivity is flat or negative once you net out the rework.
This is why DORA added a fifth metric in 2025: Rework Rate, which measures how much new code modifies or replaces code committed in the recent past. Rework Rate is designed to catch exactly the quality debt that AI generates and that the classic four metrics paper over. A high deployment frequency paired with a high rework rate is not a high-performing team. It is a team spinning, shipping volume that it has to unship and reship.
DORA also restructured the whole framework in 2025, replacing the familiar low, medium, high, and elite performer clusters with team archetypes. The cluster model implied a single ladder of performance that everyone climbs the same way. The 2025 data shattered that, because the same AI adoption sent some teams up and others down. Archetypes acknowledge that different organizational profiles get different outcomes from the same tools, which is the amplifier finding encoded into the framework itself.
What the Data Says To Do
The report is unusually prescriptive once you read past the headlines, because the failure modes are structural and structural problems have structural fixes. Three changes follow directly from the numbers.
Right-Size Review for AI-Authored PRs
The 441% review-time blowup and the 31% no-review jump are the same problem: a uniform review process applied to a doubled, increasingly machine-generated PR stream. The fix is to stop reviewing all PRs the same way. Route low-risk changes (dependency bumps, generated boilerplate, well-tested refactors) to automated checks and lightweight approval. Reserve deep human review for high-risk changes: anything touching auth, payments, data migrations, or core business logic. The goal is to spend your scarce, expensive reviewer attention where it actually prevents incidents, instead of spreading it thin enough that a third of PRs get none at all.
Scale Automated Quality Gates With Volume
Humans do not scale to a 98% throughput increase. Automated gates do. Static analysis, comprehensive test suites, policy-as-code checks, and security scanning all absorb doubled PR counts without doubling headcount, and they catch the routine defects that were drowning your reviewers. The mistake is treating automated gates as a nice-to-have you bolt on later. In an AI-volume world they are the primary defect filter, and human review becomes the targeted second layer. The teams in the Faros data that saw 50% fewer incidents are the ones that already had this layer built. This is also where strong agent scaffolding beats raw model upgrades: the system around the model, including the verification gates, determines real-world reliability far more than the model version does.
Track Incidents Per User, Not Per PR
Per-PR metrics are now misleading because PR count is inflated by AI. A 243% rise in incidents per PR sounds catastrophic, and it is serious, but PR-denominated rates both exaggerate regressions and obscure the only thing that matters: customer impact. Track incidents and severity per active user or per unit of business value delivered. That denominator does not inflate with AI volume, so it tells you whether your customers are actually experiencing more failures, which is the question executives and customers actually care about. Pairing that with continuous production monitoring for quality drift closes the loop, because the incidents this data is warning about surface in production behavior, not in pre-merge metrics.
The Practitioner Verdict
The honest read of the 2025 DORA report is uncomfortable for most engineering organizations: adopting AI coding tools without redesigning your delivery system actively worsens production stability, and that is the majority case, not the edge case. A 54% bug increase, a 243% rise in incidents per PR, and a third more code merging unreviewed are not the price of progress. They are the cost of dropping machine-paced output into human-paced gates and hoping it works out.
The good news is symmetrical. The same data shows that teams with rigor turned AI into a 50% reduction in incidents, because for them higher throughput was a benefit they could safely absorb. The deciding factor was never the model. It was whether the delivery system was built to handle the volume the model produces. AI is an amplifier. It will make your delivery process louder in whichever direction it already points.
This is the work Particula Tech runs as a delivery-system audit: we map where AI-driven volume is overrunning your review and quality gates, identify the specific gates that broke under the new throughput, and redesign them (right-sized review, automated gates scaled to volume, and incident metrics that survive PR inflation) so that the throughput gain becomes a shipped-value gain instead of an incident spike. For the broader strategy on tooling, CI, and the systems that decide whether AI development is net positive, our AI development tools pillar is the place to start.
The teams that win the next two years will not be the ones that adopted AI fastest. They will be the ones that rebuilt their delivery system to deserve the speed.
Frequently Asked Questions
Quick answers to common questions about this topic
The 2025 DORA report found AI raised raw output sharply but degraded downstream quality. Across thousands of developers, PRs per developer rose 98% while bugs per developer rose 54% (up from a 9% rise the prior year), incidents per PR rose 243%, and median PR review time rose 441%, with 31% more PRs merging with no human review at all. The headline is not that AI is bad for delivery, it is that AI accelerated code creation faster than every downstream quality gate could keep up. DORA also added a new Rework Rate metric and restructured the framework around team archetypes, replacing the old low, medium, high, and elite performer clusters that the classic four metrics produced.



