gstack - Garry Tan's Framework That Turns Claude Code Into a Virtual Engineering Team
How the Y Combinator CEO's open-source toolkit structures AI coding around 23 specialist roles - from product review to security auditing - and why the role decomposition pattern matters more than the tool itself.
On this page
When the CEO of Y Combinator publishes his personal Claude Code setup and it hits 50,000 GitHub stars in under a month, you pay attention. Not because celebrity guarantees quality - but because Garry Tan is one of the rare VC founders who still ships production code daily, and the framework he built reflects a specific, opinionated view of how AI-assisted development should work.
gstack is not a new AI model. It’s not a SaaS product. It’s a collection of 23 specialist roles and 8 power tools, all running as slash commands inside Claude Code. The core idea: stop treating your AI assistant as a single generalist, and start treating it as a team of specialists - a CEO who challenges your product thinking, a staff engineer who reviews architecture, a QA lead who tests in a real browser, a release engineer who ships your PR.
I’ve been running gstack alongside my own Agent-Skills setup for the past few weeks. Here’s what it actually does, how it works under the hood, and where it fits in the growing ecosystem of Claude Code frameworks.
Who Is Garry Tan (And Why Does That Matter)
Quick context, because the author’s background explains the framework’s philosophy:
- President & CEO of Y Combinator since January 2023
- Stanford CS graduate, early engineer at Palantir (credited with the company’s original logo and design system)
- Co-founded Posterous (acquired by Twitter in 2012)
- Founded Initialized Capital, a venture firm backing Coinbase, Instacart, and others
- Claims to have shipped 600,000+ lines of production code in 60 days using gstack - part-time, while running YC
That last point matters. Tan isn’t building gstack as a side project from a product team. He’s building it because he’s a founder-engineer who codes daily and got frustrated with the gap between “AI can write code” and “AI can help me think clearly about what to build.”
The Problem - Vibe Coding Without Guardrails
If you’ve used Claude Code for any serious project, you’ve hit this wall: the AI is fast, capable, and completely undirected. Without structure, you fall into what Tan calls “vibe coding” - letting the AI generate code without disciplined planning, review, or testing.
The symptoms are familiar:
- You start building before thinking through what you’re building
- Nobody reviews the architecture before implementation begins
- Testing happens after the fact (if at all)
- Security auditing is “I’ll do it later”
- Shipping is a manual, error-prone process
A real engineering team solves this with roles. The product manager challenges scope. The architect reviews the design. The QA engineer tests before shipping. The security engineer audits before deploy. Solo developers using AI assistants get none of this - unless they impose structure themselves.
That’s what gstack does. It imposes structure by giving Claude Code 23 distinct specialist personas, each with its own methodology, constraints, and output format.
How gstack Works
The Core Model - Structured Role Specialization
This is the most important thing to understand: gstack is not multi-agent orchestration. It’s a single Claude Code instance that switches between specialist roles on your command. You decide when to switch from “product review” to “engineering review” to “implementation” to “QA.” The AI doesn’t autonomously delegate between roles.
This is a deliberate choice. As Tan puts it: “Planning is not review. Review is not shipping… I want explicit gears.”
The workflow follows a sprint cycle: Think - Plan - Build - Review - Test - Ship - Reflect. Each phase maps to specific slash commands.
Architecture Under the Hood
gstack is built in TypeScript (80%) and Go (18%), running on Bun with a compiled ~58MB binary. Three technical decisions stand out:
1. SKILL.md Files
Each specialist role is defined in a SKILL.md file - Anthropic’s portable markdown standard for encoding agent behaviors. These files contain structured prompts with YAML frontmatter. They’re plain text, version-controllable, and portable across Claude Code, OpenAI Codex CLI, GitHub Copilot, Cursor, and other hosts.
2. Persistent Browser Daemon
Instead of cold-starting a browser for every QA or design review command, gstack runs a long-lived Chromium instance via Playwright. First command takes ~3 seconds to start up; subsequent commands respond in ~100-200ms. The daemon auto-shuts down after 30 minutes idle.
State is tracked in .gstack/browse.json (PID, port, bearer token). Random ports between 10,000-60,000 prevent conflicts across multiple workspaces.
3. Accessibility-First Reference System
When gstack’s browser daemon snapshots a page, it doesn’t use CSS selectors. It uses Playwright’s accessibility tree to generate sequential refs (@e1, @e2, @e3) resolved via getByRole() queries. This works through Shadow DOM, respects Content Security Policy, and is more robust than selector-based approaches.
Installation - 30 Seconds
Requirements: Claude Code, Git, Bun v1.0+
# Clone and setup
git clone --single-branch --depth 1 \
https://github.com/garrytan/gstack.git \
~/.claude/skills/gstack
cd ~/.claude/skills/gstack && ./setup
That’s it. All slash commands are immediately available in your next Claude Code session.
For team use (shared repos with auto-updates):
# Enable team mode
cd ~/.claude/skills/gstack && ./setup --team
# Initialize in your project
cd <your-repo>
~/.claude/skills/gstack/bin/gstack-team-init required
# Commit the configuration
git add .claude/ CLAUDE.md && git commit -m "require gstack for AI-assisted work"
Uninstall:
~/.claude/skills/gstack/bin/gstack-uninstall
The setup script auto-detects your host (Claude Code, Codex, OpenCode, Cursor, Factory Droid, Slate, Kiro). You can also target a specific host with ./setup --host <name>.
The 23 Specialist Roles
Here’s every slash command gstack adds, organized by development phase.
Planning & Strategy
| Command | Role | What It Does |
|---|---|---|
/office-hours | Product interrogator | Asks 6 forcing questions before you write a line of code. Adapts between startup mode and builder mode |
/plan-ceo-review | Founder/CEO | Rethinks from the user’s perspective. Four scope modes: expand, selective, hold, reduce |
/plan-eng-review | Engineering manager | Architecture lock-in with diagrams and test plans. The only required gate in the workflow |
/plan-design-review | Senior designer | 7-pass evaluation, rates 0-10, suggests specific fixes |
/plan-devex-review | DevEx specialist | Developer experience optimization - API ergonomics, error messages, onboarding friction |
/autoplan | All planning roles | Runs CEO, Design, and Eng review sequentially in one command |
Design
| Command | Role | What It Does |
|---|---|---|
/design-consultation | Design director | Creates a complete design system from scratch: competitive research, tokens, component inventory, writes DESIGN.md |
/design-shotgun | Visual designer | Generates 3-6 mockup variants using GPT Image, produces a comparison board |
/design-review | Design auditor | 80-item visual audit with automatic CSS fixes and before/after screenshots |
/design-html | Frontend engineer | Converts mockups to production HTML with framework detection |
Code Quality
| Command | Role | What It Does |
|---|---|---|
/review | Staff engineer | Finds production bugs that pass CI. Auto-fixes obvious issues, flags non-obvious ones |
/investigate | Debugger | Root-cause debugging with a hard rule: no fixes without investigation first. Stops after 3 failed attempts |
/cso | Chief Security Officer | OWASP Top 10 scan plus STRIDE threat modeling |
Testing
| Command | Role | What It Does |
|---|---|---|
/qa | QA lead | Real browser testing via the Playwright daemon, bug fixes, regression test generation |
/qa-only | QA reporter | Same methodology as /qa, but report-only - no code changes |
/benchmark | Performance engineer | Core Web Vitals, page load timing, resource sizes, before/after comparison |
Deployment
| Command | Role | What It Does |
|---|---|---|
/ship | Release engineer | Syncs branch, runs tests, audits coverage, pushes, opens PR |
/land-and-deploy | Deploy engineer | Merges PR, waits for CI, verifies production health |
/canary | Monitoring | Post-deploy watch for console errors and regressions |
/document-release | Doc engineer | Auto-updates all project documentation to match shipped changes |
Utilities
| Command | What It Does |
|---|---|
/browse | Real Chromium browser with ~100ms response latency |
/setup-browser-cookies | Import cookies from Chrome, Arc, Brave, or Edge via macOS Keychain |
/codex | OpenAI Codex CLI second opinion (review, adversarial, or consultation mode) |
/careful | Safety guardrails for destructive commands |
/freeze / /unfreeze | Restrict file edits to specific directories |
/learn | Persist learned patterns across sessions |
/retro | Weekly engineering retrospective |
A Typical gstack Workflow
Here’s how a real feature development session looks:
1. /office-hours → "What are we building and why?"
2. /plan-ceo-review → "Does this scope make sense from a user perspective?"
3. /plan-eng-review → "Is the architecture sound?" (required gate)
4. [implement] → Standard Claude Code coding
5. /review → Staff engineer catches production bugs
6. /cso → Security audit
7. /qa → Real browser testing
8. /ship → PR opened, tests passing
9. /land-and-deploy → Merged and deployed
10. /canary → Post-deploy monitoring
You don’t have to run every step every time. But the explicit phases prevent the “I’ll just vibe code this real quick” trap that leads to shipping untested, unreviewed code.
Security Model
gstack’s browser daemon runs with sensible security defaults:
- Localhost-only binding - no network access from outside
- Bearer token auth per session, stored in mode
0o600files - Cookie import from Chrome/Arc/Brave/Edge uses macOS Keychain (read-only, in-process decryption, never persisted in plaintext)
Bun.spawn()with explicit argument arrays prevents shell injection
Three circular log buffers (50,000 entries each) capture console messages, network requests, and dialogs. Async flush every second to .gstack/*.log.
How gstack Compares to Other Frameworks
gstack exists alongside two other major Claude Code enhancement frameworks. They solve different problems:
| Dimension | gstack (~50K stars) | Superpowers (~94K stars) | GSD (~35K stars) |
|---|---|---|---|
| Constrains | Decision-making perspective | Development process | Execution environment |
| Philosophy | ”What hat to wear" | "What steps to follow" | "Fresh context per task” |
| Strength | Forces clarity before coding | Cuts regression bugs via TDD | Quality on 50+ file projects |
| Weakness | No explicit Build phase skill | Slower builds (test-first overhead) | More complex setup |
| Best for | Founder-engineers wearing multiple hats | Solo devs needing process discipline | Complex projects exceeding context windows |
The key insight: these frameworks barely overlap. gstack governs perspective (which role are you in?), Superpowers governs process (what steps do you follow?), and GSD governs environment (how do you manage context?). You can run them together.
My own setup combines gstack’s planning phases with Agent-Skills for the build/test/review cycle. The two complement each other well - gstack asks the hard product questions before coding begins, Agent-Skills enforces engineering discipline during implementation.
What I Like
The role decomposition is the real insight. The idea that an AI coding assistant should switch between distinct specialist perspectives - not just be a generic “helpful coder” - is the pattern worth adopting regardless of whether you use gstack specifically. It forces you to think about what phase you’re in before you start typing.
/office-hours is genuinely useful. Having an AI push back on your product thinking before you write code saves more time than any code review tool. The six forcing questions surface assumptions you didn’t know you were making.
The browser daemon is well-engineered. Persistent Chromium with accessibility-tree refs is a better architecture than cold-starting browsers per command. The ~100ms latency makes iterative QA sessions feel responsive.
Portability matters. Because everything is built on SKILL.md files, the roles work across Claude Code, Codex, Cursor, and other hosts. You’re not locked into one tool.
What I’d Watch Out For
It’s structured role-play, not actual multi-agent orchestration. If you expect agents autonomously delegating work to each other, that’s not what this is. You’re the orchestrator. Each slash command activates a single specialist persona in one Claude Code session.
The 600K LOC claim needs context. Tan’s productivity numbers come from running gstack alongside Conductor - a separate Mac app that runs multiple Claude Code instances in isolated Git worktrees. gstack alone doesn’t give you parallelism.
Long agent loops can happen. One developer reported a 70-minute loop where /qa kept injecting staging URLs into production files. As with any agentic workflow, you need to stay in the loop and interrupt when things go sideways.
Some commands overlap with existing setups. If you already use Agent-Skills or Superpowers, you’ll find /review and /ship do similar things to skills you already have. Pick one or be deliberate about which framework handles which phase.
The Meta-Lesson
The most interesting thing about gstack isn’t the code. It’s the thesis: the bottleneck in AI-assisted development isn’t intelligence, it’s structure. Claude Code is already smart enough to write good code, find bugs, and suggest improvements. What it lacks - what all AI coding assistants lack - is a framework for deciding when to think about what.
gstack’s answer is role decomposition. Before you build, think like a CEO. Before you ship, think like a QA lead. Before you deploy, think like a security officer. The AI doesn’t need to be smarter. It needs to wear the right hat at the right time.
Whether you adopt gstack, build your own role system, or just internalize the principle - the pattern is worth learning. The developers shipping the best AI-assisted code in 2026 aren’t the ones with the most powerful models. They’re the ones with the most disciplined workflows.
Resources
- GitHub: garrytan/gstack (MIT license, ~50K stars)
- Official site: gstacks.org
- Architecture deep-dive: ARCHITECTURE.md
- TechCrunch coverage: Why Garry Tan’s Claude Code setup has gotten so much love, and hate
- Agents Codex analysis: Garry Tan’s gstack and the rise of AI agent teams
- Comparison with other frameworks: Superpowers, GSD, and gstack - what each framework constrains
Related articles
The 15 MCPs & Skills That Supercharge My Claude Code Setup
The most important MCP servers and Claude Code skills I use daily to boost agentic AI productivity - what they do, why they matter, and copy-paste install instructions for each one.
Agentic Development Patterns - Building Software with AI Agents
Practical patterns and workflows for agentic software development using Claude Code, Cursor, and local LLMs - from parallel workstreams to overnight agent factories.
The Skills Framework - From Vibe Coding to Production-Grade Agentic Engineering
Why Anthropic's Agent Skills and Addy Osmani's skills framework are the missing discipline layer for serious AI-assisted software engineering - and how they compare to GitHub's Spec Kit.