gstack - Garry Tan's Framework That Turns Claude Code Into a Virtual Engineering Team

When the CEO of Y Combinator publishes his personal Claude Code setup and it hits 50,000 GitHub stars in under a month, you pay attention. Not because celebrity guarantees quality - but because Garry Tan is one of the rare VC founders who still ships production code daily, and the framework he built reflects a specific, opinionated view of how AI-assisted development should work.

gstack is not a new AI model. It’s not a SaaS product. It’s a collection of 23 specialist roles and 8 power tools, all running as slash commands inside Claude Code. The core idea: stop treating your AI assistant as a single generalist, and start treating it as a team of specialists - a CEO who challenges your product thinking, a staff engineer who reviews architecture, a QA lead who tests in a real browser, a release engineer who ships your PR.

I’ve been running gstack alongside my own Agent-Skills setup for the past few weeks. Here’s what it actually does, how it works under the hood, and where it fits in the growing ecosystem of Claude Code frameworks.

Who Is Garry Tan (And Why Does That Matter)

Quick context, because the author’s background explains the framework’s philosophy:

President & CEO of Y Combinator since January 2023
Stanford CS graduate, early engineer at Palantir (credited with the company’s original logo and design system)
Co-founded Posterous (acquired by Twitter in 2012)
Founded Initialized Capital, a venture firm backing Coinbase, Instacart, and others
Claims to have shipped 600,000+ lines of production code in 60 days using gstack - part-time, while running YC

That last point matters. Tan isn’t building gstack as a side project from a product team. He’s building it because he’s a founder-engineer who codes daily and got frustrated with the gap between “AI can write code” and “AI can help me think clearly about what to build.”

The Problem - Vibe Coding Without Guardrails

If you’ve used Claude Code for any serious project, you’ve hit this wall: the AI is fast, capable, and completely undirected. Without structure, you fall into what Tan calls “vibe coding” - letting the AI generate code without disciplined planning, review, or testing.

The symptoms are familiar:

You start building before thinking through what you’re building
Nobody reviews the architecture before implementation begins
Testing happens after the fact (if at all)
Security auditing is “I’ll do it later”
Shipping is a manual, error-prone process

A real engineering team solves this with roles. The product manager challenges scope. The architect reviews the design. The QA engineer tests before shipping. The security engineer audits before deploy. Solo developers using AI assistants get none of this - unless they impose structure themselves.

That’s what gstack does. It imposes structure by giving Claude Code 23 distinct specialist personas, each with its own methodology, constraints, and output format.

How gstack Works

The Core Model - Structured Role Specialization

This is the most important thing to understand: gstack is not multi-agent orchestration. It’s a single Claude Code instance that switches between specialist roles on your command. You decide when to switch from “product review” to “engineering review” to “implementation” to “QA.” The AI doesn’t autonomously delegate between roles.

This is a deliberate choice. As Tan puts it: “Planning is not review. Review is not shipping… I want explicit gears.”

The workflow follows a sprint cycle: Think - Plan - Build - Review - Test - Ship - Reflect. Each phase maps to specific slash commands.

Architecture Under the Hood

gstack is built in TypeScript (80%) and Go (18%), running on Bun with a compiled ~58MB binary. Three technical decisions stand out:

1. SKILL.md Files

Each specialist role is defined in a SKILL.md file - Anthropic’s portable markdown standard for encoding agent behaviors. These files contain structured prompts with YAML frontmatter. They’re plain text, version-controllable, and portable across Claude Code, OpenAI Codex CLI, GitHub Copilot, Cursor, and other hosts.

2. Persistent Browser Daemon

Instead of cold-starting a browser for every QA or design review command, gstack runs a long-lived Chromium instance via Playwright. First command takes ~3 seconds to start up; subsequent commands respond in ~100-200ms. The daemon auto-shuts down after 30 minutes idle.

State is tracked in .gstack/browse.json (PID, port, bearer token). Random ports between 10,000-60,000 prevent conflicts across multiple workspaces.

3. Accessibility-First Reference System

When gstack’s browser daemon snapshots a page, it doesn’t use CSS selectors. It uses Playwright’s accessibility tree to generate sequential refs (@e1, @e2, @e3) resolved via getByRole() queries. This works through Shadow DOM, respects Content Security Policy, and is more robust than selector-based approaches.

Installation - 30 Seconds

Requirements: Claude Code, Git, Bun v1.0+

# Clone and setup
git clone --single-branch --depth 1 \
  https://github.com/garrytan/gstack.git \
  ~/.claude/skills/gstack

cd ~/.claude/skills/gstack && ./setup

That’s it. All slash commands are immediately available in your next Claude Code session.

For team use (shared repos with auto-updates):

# Enable team mode
cd ~/.claude/skills/gstack && ./setup --team

# Initialize in your project
cd <your-repo>
~/.claude/skills/gstack/bin/gstack-team-init required

# Commit the configuration
git add .claude/ CLAUDE.md && git commit -m "require gstack for AI-assisted work"

Uninstall:

~/.claude/skills/gstack/bin/gstack-uninstall

The setup script auto-detects your host (Claude Code, Codex, OpenCode, Cursor, Factory Droid, Slate, Kiro). You can also target a specific host with ./setup --host <name>.

The 23 Specialist Roles

Here’s every slash command gstack adds, organized by development phase.

Planning & Strategy

Command	Role	What It Does
`/office-hours`	Product interrogator	Asks 6 forcing questions before you write a line of code. Adapts between startup mode and builder mode
`/plan-ceo-review`	Founder/CEO	Rethinks from the user’s perspective. Four scope modes: expand, selective, hold, reduce
`/plan-eng-review`	Engineering manager	Architecture lock-in with diagrams and test plans. The only required gate in the workflow
`/plan-design-review`	Senior designer	7-pass evaluation, rates 0-10, suggests specific fixes
`/plan-devex-review`	DevEx specialist	Developer experience optimization - API ergonomics, error messages, onboarding friction
`/autoplan`	All planning roles	Runs CEO, Design, and Eng review sequentially in one command

Design

Command	Role	What It Does
`/design-consultation`	Design director	Creates a complete design system from scratch: competitive research, tokens, component inventory, writes DESIGN.md
`/design-shotgun`	Visual designer	Generates 3-6 mockup variants using GPT Image, produces a comparison board
`/design-review`	Design auditor	80-item visual audit with automatic CSS fixes and before/after screenshots
`/design-html`	Frontend engineer	Converts mockups to production HTML with framework detection

Code Quality

Command	Role	What It Does
`/review`	Staff engineer	Finds production bugs that pass CI. Auto-fixes obvious issues, flags non-obvious ones
`/investigate`	Debugger	Root-cause debugging with a hard rule: no fixes without investigation first. Stops after 3 failed attempts
`/cso`	Chief Security Officer	OWASP Top 10 scan plus STRIDE threat modeling

Testing

Command	Role	What It Does
`/qa`	QA lead	Real browser testing via the Playwright daemon, bug fixes, regression test generation
`/qa-only`	QA reporter	Same methodology as `/qa`, but report-only - no code changes
`/benchmark`	Performance engineer	Core Web Vitals, page load timing, resource sizes, before/after comparison

Deployment

Command	Role	What It Does
`/ship`	Release engineer	Syncs branch, runs tests, audits coverage, pushes, opens PR
`/land-and-deploy`	Deploy engineer	Merges PR, waits for CI, verifies production health
`/canary`	Monitoring	Post-deploy watch for console errors and regressions
`/document-release`	Doc engineer	Auto-updates all project documentation to match shipped changes

Utilities

Command	What It Does
`/browse`	Real Chromium browser with ~100ms response latency
`/setup-browser-cookies`	Import cookies from Chrome, Arc, Brave, or Edge via macOS Keychain
`/codex`	OpenAI Codex CLI second opinion (review, adversarial, or consultation mode)
`/careful`	Safety guardrails for destructive commands
`/freeze` / `/unfreeze`	Restrict file edits to specific directories
`/learn`	Persist learned patterns across sessions
`/retro`	Weekly engineering retrospective

A Typical gstack Workflow

Here’s how a real feature development session looks:

1. /office-hours        → "What are we building and why?"
2. /plan-ceo-review     → "Does this scope make sense from a user perspective?"
3. /plan-eng-review     → "Is the architecture sound?" (required gate)
4. [implement]          → Standard Claude Code coding
5. /review              → Staff engineer catches production bugs
6. /cso                 → Security audit
7. /qa                  → Real browser testing
8. /ship                → PR opened, tests passing
9. /land-and-deploy     → Merged and deployed
10. /canary             → Post-deploy monitoring

You don’t have to run every step every time. But the explicit phases prevent the “I’ll just vibe code this real quick” trap that leads to shipping untested, unreviewed code.

Security Model

gstack’s browser daemon runs with sensible security defaults:

Localhost-only binding - no network access from outside
Bearer token auth per session, stored in mode 0o600 files
Cookie import from Chrome/Arc/Brave/Edge uses macOS Keychain (read-only, in-process decryption, never persisted in plaintext)
Bun.spawn() with explicit argument arrays prevents shell injection

Three circular log buffers (50,000 entries each) capture console messages, network requests, and dialogs. Async flush every second to .gstack/*.log.

How gstack Compares to Other Frameworks

gstack exists alongside two other major Claude Code enhancement frameworks. They solve different problems:

Dimension	gstack (~50K stars)	Superpowers (~94K stars)	GSD (~35K stars)
Constrains	Decision-making perspective	Development process	Execution environment
Philosophy	”What hat to wear"	"What steps to follow"	"Fresh context per task”
Strength	Forces clarity before coding	Cuts regression bugs via TDD	Quality on 50+ file projects
Weakness	No explicit Build phase skill	Slower builds (test-first overhead)	More complex setup
Best for	Founder-engineers wearing multiple hats	Solo devs needing process discipline	Complex projects exceeding context windows

The key insight: these frameworks barely overlap. gstack governs perspective (which role are you in?), Superpowers governs process (what steps do you follow?), and GSD governs environment (how do you manage context?). You can run them together.

My own setup combines gstack’s planning phases with Agent-Skills for the build/test/review cycle. The two complement each other well - gstack asks the hard product questions before coding begins, Agent-Skills enforces engineering discipline during implementation.

What I Like

The role decomposition is the real insight. The idea that an AI coding assistant should switch between distinct specialist perspectives - not just be a generic “helpful coder” - is the pattern worth adopting regardless of whether you use gstack specifically. It forces you to think about what phase you’re in before you start typing.

/office-hours is genuinely useful. Having an AI push back on your product thinking before you write code saves more time than any code review tool. The six forcing questions surface assumptions you didn’t know you were making.

The browser daemon is well-engineered. Persistent Chromium with accessibility-tree refs is a better architecture than cold-starting browsers per command. The ~100ms latency makes iterative QA sessions feel responsive.

Portability matters. Because everything is built on SKILL.md files, the roles work across Claude Code, Codex, Cursor, and other hosts. You’re not locked into one tool.

What I’d Watch Out For

It’s structured role-play, not actual multi-agent orchestration. If you expect agents autonomously delegating work to each other, that’s not what this is. You’re the orchestrator. Each slash command activates a single specialist persona in one Claude Code session.

The 600K LOC claim needs context. Tan’s productivity numbers come from running gstack alongside Conductor - a separate Mac app that runs multiple Claude Code instances in isolated Git worktrees. gstack alone doesn’t give you parallelism.

Long agent loops can happen. One developer reported a 70-minute loop where /qa kept injecting staging URLs into production files. As with any agentic workflow, you need to stay in the loop and interrupt when things go sideways.

Some commands overlap with existing setups. If you already use Agent-Skills or Superpowers, you’ll find /review and /ship do similar things to skills you already have. Pick one or be deliberate about which framework handles which phase.

The Meta-Lesson

The most interesting thing about gstack isn’t the code. It’s the thesis: the bottleneck in AI-assisted development isn’t intelligence, it’s structure. Claude Code is already smart enough to write good code, find bugs, and suggest improvements. What it lacks - what all AI coding assistants lack - is a framework for deciding when to think about what.

gstack’s answer is role decomposition. Before you build, think like a CEO. Before you ship, think like a QA lead. Before you deploy, think like a security officer. The AI doesn’t need to be smarter. It needs to wear the right hat at the right time.

Whether you adopt gstack, build your own role system, or just internalize the principle - the pattern is worth learning. The developers shipping the best AI-assisted code in 2026 aren’t the ones with the most powerful models. They’re the ones with the most disciplined workflows.

Resources

GitHub: garrytan/gstack (MIT license, ~50K stars)
Official site: gstacks.org
Architecture deep-dive: ARCHITECTURE.md
TechCrunch coverage: Why Garry Tan’s Claude Code setup has gotten so much love, and hate
Agents Codex analysis: Garry Tan’s gstack and the rise of AI agent teams
Comparison with other frameworks: Superpowers, GSD, and gstack - what each framework constrains