Superpowers and GStack both solve the same root problem—AI coding agents that skip planning and write buggy code—but take opposite approaches. Superpowers (106K stars, by Jesse Vincent) enforces a rigid 7-phase TDD-first pipeline: brainstorm, plan, test, implement, review. It uses psychological persuasion principles to prevent agents from rationalizing shortcuts. GStack (39K stars, by Garry Tan) organizes AI into role-based specialists—CEO reviewer, staff engineer, QA lead, security officer—with 28 slash commands and a persistent Chromium daemon for visual QA. Pick Superpowers if your biggest problem is code quality and test coverage on complex projects. Pick GStack if you need a complete sprint lifecycle from product thinking through deployment with visual testing. Both are MIT-licensed and work with Claude Code. Many teams use both together—Superpowers for implementation discipline, GStack for planning and QA.
Two weeks ago, a client's engineering lead asked us a question we've been hearing constantly: "Should we install Superpowers or GStack on our Claude Code setup?"
The honest answer is that the question itself is slightly wrong. These two skill packs solve different problems at different points in the development lifecycle. But the comparison matters because together they represent the two dominant philosophies for making AI coding agents actually reliable—and picking the wrong one for your workflow costs real time.
Superpowers has been growing steadily since October 2025 and now sits at 106,000 GitHub stars. GStack launched on March 12, 2026 and hit 39,000 stars in 11 days. Both are MIT-licensed. Both work with Claude Code. Both exist because the same problem keeps burning developers: AI agents skip planning, skip tests, and write plausible-looking code that breaks in production.
The approaches couldn't be more different. For a broader look at how AI coding tools compare, see our Cursor vs Claude Code comparison.
The Problem Both Solve
Without structured guidance, AI coding agents exhibit a consistent failure pattern:
If you've used Claude Code, Cursor, or any AI coding agent on a project larger than a single file, you've hit at least three of these. Both Superpowers and GStack exist to fix this. They just disagree on how.
Superpowers: Enforced Discipline Through Process
Creator: Jesse Vincent | Stars: ~106K | License: MIT | Version: v5.0.5
Superpowers enforces a rigid 7-phase pipeline that prevents the agent from writing code until it has earned the right to:
The enforcement mechanism is what makes Superpowers distinctive. Rather than politely suggesting the agent follow best practices, it uses what Jesse Vincent calls the "1% Rule": if there's even a 1% chance a skill applies, the agent must invoke it. Each skill includes "Red Flags" sections that list the exact rationalizations agents use to skip steps—"this is just a simple question," "I already know the answer"—with prewritten reality-check responses.
This design is informed by research on persuasion principles applied to LLMs (validated by Wharton's "Call Me a Jerk" paper). The framework doesn't expect the AI to understand why TDD matters. It structurally prevents the AI from skipping it.
What v5.0 Added (March 2026)
- Visual Brainstorming Companion — A local web server delivers HTML mockups and diagrams to your browser, replacing ASCII art in the terminal
- Subagent-Driven Development — Default since v5. Fresh subagents per task with two-stage review (spec compliance, then code quality)
- Intelligent Model Selection — Routes implementation tasks to cheaper models (often Haiku) while keeping planning on Opus
- Interface-Driven Design — Mandatory file structure planning before task decomposition
Superpowers in Practice
The chardet 7.0.0 Python library was built entirely using the Superpowers workflow. Result: 41x faster performance and 96.8% accuracy (up 2.3 percentage points), with dozens of longstanding issues fixed. One solo developer reportedly delivered a project scoped for "4 people x 6 months" in 2 months using the framework. But the overhead is real. The brainstorming and planning phases add 10–20 minutes before any code appears. Simon Willison, who endorsed the framework, also noted that using it left him "mentally exhausted after just a couple of hours"—comparing it to "riding your bike in a higher gear: faster but takes more effort."
| Phase | What Happens | Can Skip? |
|---|---|---|
| 1. Brainstorming | Agent asks clarifying questions, explores alternatives, produces a design doc for approval | No |
| 2. Git Worktrees | Creates an isolated branch and verifies baseline tests pass | No |
| 3. Writing Plans | Decomposes work into 2–5 minute tasks with exact file paths and verification steps | No |
| 4. Subagent Execution | Fresh subagents handle each task in isolation, then undergo two-stage review | Configurable |
| 5. TDD | Strict RED-GREEN-REFACTOR—code written before tests exist gets deleted | No |
| 6. Code Review | Reviews implementation against spec, categorizes issues by severity | No |
| 7. Finishing | Confirms all tests pass, offers merge/PR/discard options | No |
GStack: A Virtual Dev Team in Slash Commands
Creator: Garry Tan (Y Combinator CEO) | Stars: ~39K | License: MIT
Where Superpowers enforces a single process pipeline, GStack gives you a roster of specialized roles you can invoke on demand. Garry Tan claims to have shipped 600,000+ lines of production code in 60 days (35% tests) using it.
GStack provides 28 slash commands organized by role. Here are the ones that matter most:
Planning & Strategy
Development & Review
Testing & Security
Deployment
The Chromium Daemon: GStack's Secret Weapon
The most technically distinctive feature is GStack's three-tier persistent browser architecture: Performance characteristics: This means /qa and /browse take real screenshots and click real elements—they don't just analyze code and guess what the UI looks like. The system uses Playwright Locators on the accessibility tree instead of DOM mutation, so it works reliably even under CSP restrictions and framework hydration. The catch: cookie decryption currently only works with macOS Keychain. Windows and Linux credential store support isn't implemented yet.
- 1. CLI (compiled Bun binary, ~58MB) — Reads state, makes HTTP POST to localhost
- 2. HTTP Server (Bun.serve) — Dispatches commands to Chromium via Chrome DevTools Protocol
- 3. Chromium (headless via Playwright) — Persistent tabs, cookies, login sessions
- Cold start: ~3–5 seconds
- Subsequent calls: ~100–200ms
- Auto-starts on first use, auto-shuts after 30 minutes idle
- Localhost-only with Bearer token auth
- Sessions persist: cookies, tabs, localStorage carry across commands
| Command | Role | What It Does |
|---|---|---|
/office-hours | YC Partner | Conducts 6 forcing questions to reframe product direction before coding |
/plan-ceo-review | Founder/CEO | Rethinks the problem to find "the 10-star product"—four scope modes from expansion to reduction |
/plan-eng-review | Eng Manager | Locks architecture, system boundaries, data flow, failure modes, test coverage |
/plan-design-review | Senior Designer | Seven passes over design (IA, interaction states, user journey, AI slop, design system, responsive/a11y) |
/autoplan | Pipeline | Runs CEO → design → eng review in a single command |
| Command | Role | What It Does |
|---|---|---|
/review | Staff Engineer | Structural audit: N+1 queries, race conditions, stale reads, trust boundaries. Auto-fixes mechanical issues |
/investigate | Debugger | Root cause analysis before fixes. Stops after 3 failed hypotheses to question architecture |
/codex | Cross-Model | Independent code review from an alternative model |
| Command | Role | What It Does |
|---|---|---|
/qa | QA Lead | Four modes: diff-aware, full systematic, 30-second smoke, and regression testing |
/cso | Security Officer | OWASP Top 10 + STRIDE threat modeling. Scans for injection, auth, crypto, access control |
/benchmark | Perf Engineer | Performance baseline testing |
| Command | Role | What It Does |
|---|---|---|
/ship | Release Engineer | Syncs main, runs tests, audits coverage, pushes, opens PR—one command |
/retro | Eng Manager | Weekly retrospective with per-person breakdowns and test health trends |
Head-to-Head Comparison
| Dimension | Superpowers | GStack |
|---|---|---|
| Philosophy | Process enforcement—one pipeline, no shortcuts | Role specialization—invoke the right expert |
| Commands | ~14 skills (auto-invoked) | 28 slash commands (user-invoked) |
| Invocation | Automatic—1% Rule triggers skills | Manual—you call the slash command you need |
| TDD | Mandatory. Code before tests = deleted | Available via /qa but not enforced |
| Planning | Mandatory brainstorming + planning phases | Optional /office-hours + /plan-ceo-review |
| Visual QA | v5.0 adds HTML mockups in browser | Full headless Chromium for live site testing |
| Security | Not a focus | /cso runs OWASP + STRIDE scans |
| Deployment | Manual—ends at merge/PR decision | /ship handles the full release pipeline |
| Multi-platform | Claude Code, Cursor, Codex, Gemini CLI, others | Claude Code, Cursor, Codex, Gemini CLI |
| Subagents | First-class—fresh agents per task with review | Not a core feature |
| GitHub Stars | ~106K (since Oct 2025) | ~39K (since Mar 12, 2026) |
| Overhead | High—10–20 min before first code | Low—invoke only the commands you need |
| Learning Curve | Moderate—understand the pipeline | Low—each command is self-contained |
| Best For | Complex projects needing bulletproof test coverage | Full sprint lifecycle with visual verification |
When to Use Superpowers
Choose Superpowers when:
Skip Superpowers when you're writing quick scripts, prototyping throwaway ideas, or working on projects where the 10–20 minute planning overhead exceeds the value of the code being written.
When to Use GStack
Choose GStack when:
/office-hours and /plan-ceo-review commands force the "what are we actually building?" conversation before anyone touches code. This is valuable for founders and product engineers./cso command runs OWASP Top 10 + STRIDE threat modeling. Early users have reported it finding legitimate XSS vulnerabilities./review and /ship without the full planning ceremony.Skip GStack when you need strict TDD enforcement (GStack makes testing available but not mandatory) or when you're working on non-web projects where the Chromium daemon provides no value.
Using Both Together
The skill packs don't conflict, and the combination covers gaps that neither addresses alone. Here's a workflow we've been testing with clients:
/office-hours and /plan-ceo-review to define what to build/plan-eng-review to lock system boundaries and data flow/qa with Chromium for real-browser testing/cso for OWASP + STRIDE scanning/ship for the push-to-PR pipelineThis gives you product-level thinking (GStack), implementation discipline (Superpowers), and visual + security verification (GStack). The overlap is minimal—Superpowers owns the implementation loop, GStack owns everything before and after it.
What the Critics Say
Neither tool is without criticism, and the criticisms matter because they reveal real limitations.
On Superpowers:
On GStack:
Both criticisms have merit. The takeaway isn't that either tool is bad—it's that neither is magic. They're structured workflows that improve baseline agent behavior, not replacements for engineering judgment.
Installation
Superpowers (Claude Code Marketplace):
/plugin install superpowers@claude-plugins-official
GStack (Global Install):
git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack cd ~/.claude/skills/gstack && ./setup
Both support per-project vendored installation. For cross-platform setup (Codex, Cursor, Gemini CLI), check each project's README for host-specific flags.
The Bottom Line
Superpowers and GStack represent two valid answers to the same question: how do you make AI coding agents reliable enough for production work?
Superpowers says: enforce a rigid process. Make TDD mandatory. Delete code written without tests. Use psychological principles to prevent the agent from rationalizing shortcuts. Accept higher overhead in exchange for higher quality.
GStack says: specialize roles. Give the agent a CEO hat for product thinking, a staff engineer hat for code review, a QA hat for testing, a security officer hat for audits. Let the developer invoke the right role at the right time.
If you're choosing one, choose based on your pain point. If your agents write code that works but isn't tested or thought through, Superpowers fixes that. If your agents write decent code but lack product thinking, visual QA, and deployment automation, GStack fixes that.
If you can install both—and you should try—you get the best of each. The AI coding agent space is moving fast enough that structured workflows like these aren't optional luxuries anymore. They're how you keep shipping quality code when the agent is writing most of it. For more on how to configure agent behavior, see our guide to AGENTS.md and AI coding agent configuration.
Frequently Asked Questions
Quick answers to common questions about this topic
Superpowers is a process-enforcement framework that forces AI agents through a 7-phase TDD pipeline—brainstorming, planning, testing, implementation, and review—before any code ships. GStack is a role-based skill pack that gives Claude Code 28 specialized slash commands mimicking a full dev team (CEO review, engineering review, QA, security audit, deployment). Superpowers focuses on how code gets written. GStack focuses on what roles review it.



