Skip to content
Back to Tech
GenAI · 12 min read

gstack - Garry Tan's Framework That Turns Claude Code Into a Virtual Engineering Team

How the Y Combinator CEO's open-source toolkit structures AI coding around 23 specialist roles - from product review to security auditing - and why the role decomposition pattern matters more than the tool itself.

Share
On this page

When the CEO of Y Combinator publishes his personal Claude Code setup and it hits 50,000 GitHub stars in under a month, you pay attention. Not because celebrity guarantees quality - but because Garry Tan is one of the rare VC founders who still ships production code daily, and the framework he built reflects a specific, opinionated view of how AI-assisted development should work.

gstack is not a new AI model. It’s not a SaaS product. It’s a collection of 23 specialist roles and 8 power tools, all running as slash commands inside Claude Code. The core idea: stop treating your AI assistant as a single generalist, and start treating it as a team of specialists - a CEO who challenges your product thinking, a staff engineer who reviews architecture, a QA lead who tests in a real browser, a release engineer who ships your PR.

I’ve been running gstack alongside my own Agent-Skills setup for the past few weeks. Here’s what it actually does, how it works under the hood, and where it fits in the growing ecosystem of Claude Code frameworks.


Who Is Garry Tan (And Why Does That Matter)

Quick context, because the author’s background explains the framework’s philosophy:

  • President & CEO of Y Combinator since January 2023
  • Stanford CS graduate, early engineer at Palantir (credited with the company’s original logo and design system)
  • Co-founded Posterous (acquired by Twitter in 2012)
  • Founded Initialized Capital, a venture firm backing Coinbase, Instacart, and others
  • Claims to have shipped 600,000+ lines of production code in 60 days using gstack - part-time, while running YC

That last point matters. Tan isn’t building gstack as a side project from a product team. He’s building it because he’s a founder-engineer who codes daily and got frustrated with the gap between “AI can write code” and “AI can help me think clearly about what to build.”


The Problem - Vibe Coding Without Guardrails

If you’ve used Claude Code for any serious project, you’ve hit this wall: the AI is fast, capable, and completely undirected. Without structure, you fall into what Tan calls “vibe coding” - letting the AI generate code without disciplined planning, review, or testing.

The symptoms are familiar:

  • You start building before thinking through what you’re building
  • Nobody reviews the architecture before implementation begins
  • Testing happens after the fact (if at all)
  • Security auditing is “I’ll do it later”
  • Shipping is a manual, error-prone process

A real engineering team solves this with roles. The product manager challenges scope. The architect reviews the design. The QA engineer tests before shipping. The security engineer audits before deploy. Solo developers using AI assistants get none of this - unless they impose structure themselves.

That’s what gstack does. It imposes structure by giving Claude Code 23 distinct specialist personas, each with its own methodology, constraints, and output format.


How gstack Works

The Core Model - Structured Role Specialization

This is the most important thing to understand: gstack is not multi-agent orchestration. It’s a single Claude Code instance that switches between specialist roles on your command. You decide when to switch from “product review” to “engineering review” to “implementation” to “QA.” The AI doesn’t autonomously delegate between roles.

This is a deliberate choice. As Tan puts it: “Planning is not review. Review is not shipping… I want explicit gears.”

The workflow follows a sprint cycle: Think - Plan - Build - Review - Test - Ship - Reflect. Each phase maps to specific slash commands.

Architecture Under the Hood

gstack is built in TypeScript (80%) and Go (18%), running on Bun with a compiled ~58MB binary. Three technical decisions stand out:

1. SKILL.md Files

Each specialist role is defined in a SKILL.md file - Anthropic’s portable markdown standard for encoding agent behaviors. These files contain structured prompts with YAML frontmatter. They’re plain text, version-controllable, and portable across Claude Code, OpenAI Codex CLI, GitHub Copilot, Cursor, and other hosts.

2. Persistent Browser Daemon

Instead of cold-starting a browser for every QA or design review command, gstack runs a long-lived Chromium instance via Playwright. First command takes ~3 seconds to start up; subsequent commands respond in ~100-200ms. The daemon auto-shuts down after 30 minutes idle.

State is tracked in .gstack/browse.json (PID, port, bearer token). Random ports between 10,000-60,000 prevent conflicts across multiple workspaces.

3. Accessibility-First Reference System

When gstack’s browser daemon snapshots a page, it doesn’t use CSS selectors. It uses Playwright’s accessibility tree to generate sequential refs (@e1, @e2, @e3) resolved via getByRole() queries. This works through Shadow DOM, respects Content Security Policy, and is more robust than selector-based approaches.


Installation - 30 Seconds

Requirements: Claude Code, Git, Bun v1.0+

# Clone and setup
git clone --single-branch --depth 1 \
  https://github.com/garrytan/gstack.git \
  ~/.claude/skills/gstack

cd ~/.claude/skills/gstack && ./setup

That’s it. All slash commands are immediately available in your next Claude Code session.

For team use (shared repos with auto-updates):

# Enable team mode
cd ~/.claude/skills/gstack && ./setup --team

# Initialize in your project
cd <your-repo>
~/.claude/skills/gstack/bin/gstack-team-init required

# Commit the configuration
git add .claude/ CLAUDE.md && git commit -m "require gstack for AI-assisted work"

Uninstall:

~/.claude/skills/gstack/bin/gstack-uninstall

The setup script auto-detects your host (Claude Code, Codex, OpenCode, Cursor, Factory Droid, Slate, Kiro). You can also target a specific host with ./setup --host <name>.


The 23 Specialist Roles

Here’s every slash command gstack adds, organized by development phase.

Planning & Strategy

CommandRoleWhat It Does
/office-hoursProduct interrogatorAsks 6 forcing questions before you write a line of code. Adapts between startup mode and builder mode
/plan-ceo-reviewFounder/CEORethinks from the user’s perspective. Four scope modes: expand, selective, hold, reduce
/plan-eng-reviewEngineering managerArchitecture lock-in with diagrams and test plans. The only required gate in the workflow
/plan-design-reviewSenior designer7-pass evaluation, rates 0-10, suggests specific fixes
/plan-devex-reviewDevEx specialistDeveloper experience optimization - API ergonomics, error messages, onboarding friction
/autoplanAll planning rolesRuns CEO, Design, and Eng review sequentially in one command

Design

CommandRoleWhat It Does
/design-consultationDesign directorCreates a complete design system from scratch: competitive research, tokens, component inventory, writes DESIGN.md
/design-shotgunVisual designerGenerates 3-6 mockup variants using GPT Image, produces a comparison board
/design-reviewDesign auditor80-item visual audit with automatic CSS fixes and before/after screenshots
/design-htmlFrontend engineerConverts mockups to production HTML with framework detection

Code Quality

CommandRoleWhat It Does
/reviewStaff engineerFinds production bugs that pass CI. Auto-fixes obvious issues, flags non-obvious ones
/investigateDebuggerRoot-cause debugging with a hard rule: no fixes without investigation first. Stops after 3 failed attempts
/csoChief Security OfficerOWASP Top 10 scan plus STRIDE threat modeling

Testing

CommandRoleWhat It Does
/qaQA leadReal browser testing via the Playwright daemon, bug fixes, regression test generation
/qa-onlyQA reporterSame methodology as /qa, but report-only - no code changes
/benchmarkPerformance engineerCore Web Vitals, page load timing, resource sizes, before/after comparison

Deployment

CommandRoleWhat It Does
/shipRelease engineerSyncs branch, runs tests, audits coverage, pushes, opens PR
/land-and-deployDeploy engineerMerges PR, waits for CI, verifies production health
/canaryMonitoringPost-deploy watch for console errors and regressions
/document-releaseDoc engineerAuto-updates all project documentation to match shipped changes

Utilities

CommandWhat It Does
/browseReal Chromium browser with ~100ms response latency
/setup-browser-cookiesImport cookies from Chrome, Arc, Brave, or Edge via macOS Keychain
/codexOpenAI Codex CLI second opinion (review, adversarial, or consultation mode)
/carefulSafety guardrails for destructive commands
/freeze / /unfreezeRestrict file edits to specific directories
/learnPersist learned patterns across sessions
/retroWeekly engineering retrospective

A Typical gstack Workflow

Here’s how a real feature development session looks:

1. /office-hours        → "What are we building and why?"
2. /plan-ceo-review     → "Does this scope make sense from a user perspective?"
3. /plan-eng-review     → "Is the architecture sound?" (required gate)
4. [implement]          → Standard Claude Code coding
5. /review              → Staff engineer catches production bugs
6. /cso                 → Security audit
7. /qa                  → Real browser testing
8. /ship                → PR opened, tests passing
9. /land-and-deploy     → Merged and deployed
10. /canary             → Post-deploy monitoring

You don’t have to run every step every time. But the explicit phases prevent the “I’ll just vibe code this real quick” trap that leads to shipping untested, unreviewed code.


Security Model

gstack’s browser daemon runs with sensible security defaults:

  • Localhost-only binding - no network access from outside
  • Bearer token auth per session, stored in mode 0o600 files
  • Cookie import from Chrome/Arc/Brave/Edge uses macOS Keychain (read-only, in-process decryption, never persisted in plaintext)
  • Bun.spawn() with explicit argument arrays prevents shell injection

Three circular log buffers (50,000 entries each) capture console messages, network requests, and dialogs. Async flush every second to .gstack/*.log.


How gstack Compares to Other Frameworks

gstack exists alongside two other major Claude Code enhancement frameworks. They solve different problems:

Dimensiongstack (~50K stars)Superpowers (~94K stars)GSD (~35K stars)
ConstrainsDecision-making perspectiveDevelopment processExecution environment
Philosophy”What hat to wear""What steps to follow""Fresh context per task”
StrengthForces clarity before codingCuts regression bugs via TDDQuality on 50+ file projects
WeaknessNo explicit Build phase skillSlower builds (test-first overhead)More complex setup
Best forFounder-engineers wearing multiple hatsSolo devs needing process disciplineComplex projects exceeding context windows

The key insight: these frameworks barely overlap. gstack governs perspective (which role are you in?), Superpowers governs process (what steps do you follow?), and GSD governs environment (how do you manage context?). You can run them together.

My own setup combines gstack’s planning phases with Agent-Skills for the build/test/review cycle. The two complement each other well - gstack asks the hard product questions before coding begins, Agent-Skills enforces engineering discipline during implementation.


What I Like

The role decomposition is the real insight. The idea that an AI coding assistant should switch between distinct specialist perspectives - not just be a generic “helpful coder” - is the pattern worth adopting regardless of whether you use gstack specifically. It forces you to think about what phase you’re in before you start typing.

/office-hours is genuinely useful. Having an AI push back on your product thinking before you write code saves more time than any code review tool. The six forcing questions surface assumptions you didn’t know you were making.

The browser daemon is well-engineered. Persistent Chromium with accessibility-tree refs is a better architecture than cold-starting browsers per command. The ~100ms latency makes iterative QA sessions feel responsive.

Portability matters. Because everything is built on SKILL.md files, the roles work across Claude Code, Codex, Cursor, and other hosts. You’re not locked into one tool.


What I’d Watch Out For

It’s structured role-play, not actual multi-agent orchestration. If you expect agents autonomously delegating work to each other, that’s not what this is. You’re the orchestrator. Each slash command activates a single specialist persona in one Claude Code session.

The 600K LOC claim needs context. Tan’s productivity numbers come from running gstack alongside Conductor - a separate Mac app that runs multiple Claude Code instances in isolated Git worktrees. gstack alone doesn’t give you parallelism.

Long agent loops can happen. One developer reported a 70-minute loop where /qa kept injecting staging URLs into production files. As with any agentic workflow, you need to stay in the loop and interrupt when things go sideways.

Some commands overlap with existing setups. If you already use Agent-Skills or Superpowers, you’ll find /review and /ship do similar things to skills you already have. Pick one or be deliberate about which framework handles which phase.


The Meta-Lesson

The most interesting thing about gstack isn’t the code. It’s the thesis: the bottleneck in AI-assisted development isn’t intelligence, it’s structure. Claude Code is already smart enough to write good code, find bugs, and suggest improvements. What it lacks - what all AI coding assistants lack - is a framework for deciding when to think about what.

gstack’s answer is role decomposition. Before you build, think like a CEO. Before you ship, think like a QA lead. Before you deploy, think like a security officer. The AI doesn’t need to be smarter. It needs to wear the right hat at the right time.

Whether you adopt gstack, build your own role system, or just internalize the principle - the pattern is worth learning. The developers shipping the best AI-assisted code in 2026 aren’t the ones with the most powerful models. They’re the ones with the most disciplined workflows.


Resources

claude-code gstack garry-tan skills agentic-ai workflow productivity y-combinator code-review qa

Related articles