Vibe Coding in Prod Is a Management Problem, Not a Coding Problem

Eric Schluntz, a researcher at Anthropic who co-authored Building Effective Agents with Barry Zhang, gave a short talk at a recent event titled “How to Vibe Code in Prod Responsibly.” I watched it twice. It is the clearest statement I have heard yet of what the next two years of software engineering are actually going to feel like, and the argument is not the one most people expect.

The argument is not about prompt quality, model selection, or which IDE wins. It is that vibe coding in production is a management problem, and software engineers are the last functional group in the company to be confronted with it. Every other manager in the world already solves this problem. We are just late.

This is the piece I wish I had written myself. Since I didn’t, here is what Schluntz said, what I took from it, and how it slots into the agentic stack I have been writing about for the last six months.

What Vibe Coding Actually Means

Schluntz opens with a definitional tightening that matters. A lot of people, he notes, conflate vibe coding with “using a lot of AI to generate code.” Copilot users. Cursor users. Engineers who let Claude write most of the diff but still review every line. That isn’t vibe coding.

The reference point is Andrej Karpathy’s original definition: you fully give in to the vibes, embrace exponentials, and forget that the code even exists. The operative clause is forget that the code even exists. If you are still in a tight feedback loop with the model, reviewing each chunk it produces, you are doing supervised AI-assisted coding, which is a fine thing but a different thing.

True vibe coding is what happens when the unit of AI output grows past what you can reasonably review. And that is coming whether we like it or not.

The Exponential Is the Whole Argument

Schluntz’s framing of the stakes is the tightest I have seen:

The length of tasks that AI can do is doubling every seven months. Right now we’re at about an hour. You don’t need to vibe code. You can have Cursor work for you, have Claude Code write a feature that would take an hour, and you can review all that code. But what happens next year? The year after? When the AI can generate an entire day’s or week’s worth of work for you at a time, there is no way we’re going to keep up with that if we still need to move in lockstep.

I have made a version of this argument before in The Last Bottleneck, but Schluntz’s compiler analogy is better than anything I came up with. In the early days of compilers, developers used them but still read the emitted assembly to check the output. That doesn’t scale. At some point you trust the compiler and work at the higher abstraction, because the alternative is staying on small programs forever.

We are at the same transition with code. The question is not whether we will cross it, but whether we cross it responsibly.

The Old Management Problem

Here is the point I missed until Schluntz said it out loud: this is not a new problem. Every manager in the world has been solving it for hundreds of years.

How does a CTO manage an expert in a domain where the CTO is not an expert? How does a PM sign off on a feature whose implementation they cannot read? How does a CEO check the accountant’s work without being a CPA?

They use abstraction layers they can verify without understanding the implementation underneath. The CTO writes acceptance tests. The PM uses the product and confirms it matches the spec. The CEO spot-checks figures they do understand, in slices of the data small enough to reason about, until the broader model earns their trust. This pattern is as old as civilization, and it is how every large organization handles work that no single person can fully inspect.

Software engineers are the unusual case here. Until now, our craft assumed the individual contributor could go all the way down the stack and understand every line personally. That was a luxury most other professions never had. The exponential is going to take it away from us, the way it took assembly-level fluency away from most developers a generation ago.

The challenge Schluntz poses: “How will we vibe code in prod and do it safely? My answer is that we will forget that the code exists but not that the product exists.” That last clause is the whole game. You still have to know, verify, and own the product. You stop needing to personally audit the code that implements it.

The Caveat: Tech Debt

Schluntz is honest about where the analogy breaks. Most abstractions that managers verify, they can verify with external evidence. The accountant’s books reconcile. The product’s feature works. The acceptance test passes.

Tech debt is the exception. Right now there is no good way to measure or validate tech debt without reading the code yourself. You can pass every test, ship every feature, and still end up with a codebase that is quietly calcifying under the hood, and no automated signal will tell you.

This is the one thing, for now, that still demands expert inspection. Which brings us to the most useful actionable idea in the talk.

Leaf Nodes and Trunks

If you can’t verify tech debt from outside the code, then you have to be smart about where you let debt accumulate. Schluntz’s rule is simple:

Leaf nodes are the parts of your system that nothing else depends on. End features, polish, the last mile. If debt lands here, it is contained. These are safe to vibe code.
Trunks and branches are the core architecture - modules and interfaces that many other things are built on top of. Debt here compounds. These still deserve careful human engineering.

This is the most concrete piece of architectural advice I have taken from the talk. It is also a useful way to retroactively audit a codebase: if you draw the dependency graph and shade the leaves, you have just mapped where agents can move fastest with the least risk. Everything upstream of a well-used module is a place where you slow down, review carefully, and invest in extensibility.

For anyone trying to set a team-wide policy on “where can we lean on Claude fully,” this is the right shape of answer. Not “tests or no tests.” Not “critical or non-critical.” The question is where in the dependency graph this code sits.

The 22,000-Line PR

The most striking concrete example in the talk was Schluntz showing a screenshot of an actual GitHub PR: a 22,000-line change to Anthropic’s production reinforcement learning codebase, written largely by Claude. They merged it.

How does that happen responsibly? Four ingredients, each a direct application of the principles above:

Days of human requirements and guidance work before a line was written. Schluntz and his team acted as Claude’s product manager, not its typist.
Concentration on leaf nodes. The change lived in parts of the system where future tech debt was acceptable because nothing was going to be built on top of it.
Heavy human review on the extensible parts. The small slice that was structural got the scrutiny of a normal engineering review.
Stress tests and designed verifiability. They built the system to have human-verifiable inputs and outputs. Stability, their biggest concern, was measured by stress tests running for long durations, which required zero code reading to interpret.

Schluntz’s summary of the outcome is the part worth quoting:

Ultimately by combining those things we were able to become just as confident in this change as any other change that we made to our codebase but deliver it in a tiny fraction of the time and effort.

Read that sentence again. The bar they held themselves to was equal confidence to a human-written change. Not lower, not “good enough for an AI-written PR.” The same confidence, via different verification mechanisms. That is what responsible vibe coding in prod looks like, and it is what I want every team I work with to aspire to.

The Second-Order Effect

The effect Schluntz mentions almost as an aside is the one that will reshape how teams plan work:

Now suddenly when something costs one day of time instead of two weeks, you realize that you can go and make much bigger features and much bigger changes. The marginal cost of software is lower and it lets you consume and build more software.

This is exactly the dynamic I keep seeing play out in my own work and in the companies I advise. Once a team internalizes that a change that would have been a two-week project is now a one-day project, they stop filtering their roadmap through the old cost model. Things that were never on the list because they weren’t worth it are suddenly on the list.

This is also why the companies that learn to vibe code in prod responsibly will not just ship faster than the ones that don’t. They will ship different things. Bigger refactors. Internal tools no one would have built. Experiments that used to be blocked by engineering cost.

The London startup I am about to join has been on my mind a lot while writing this. The delta between a team that has this muscle and one that doesn’t will show up in the product surface area within a year.

Be Claude’s PM

The talk’s practical advice for individual engineers is captured in one line I am going to quote back to people for a long time:

Ask not what Claude can do for you, but what you can do for Claude.

The mindset shift is from prompter to product manager. You are Claude’s PM. What guidance, requirements, context, and constraints would a capable new engineer on your team need to succeed at this task on day one? Collect that, hand it over, and let Claude cook.

Schluntz’s own working pattern is concrete and worth imitating:

15 to 20 minutes of context gathering before letting the agent execute. Often this is itself a conversation with Claude - exploring the codebase, finding files, building a plan together, capturing the result as a single artifact.
Hand the artifact to Claude in a new context or as an explicit execute-this-plan prompt.
Compact the context at natural stopping points, the same moments a human would stop for lunch.

I recognize this because it is essentially the workflow I described in Agentic Development Patterns and The Perfect CLAUDE.md: treat the agent as a colleague who needs to be onboarded, not a search engine. What Schluntz adds is the framing - you are the product manager, not the coder - and the vocabulary of verifiability that goes with it.

Where This Slots Into My Stack

If you have followed the agentic development pieces on this site, you can see how Schluntz’s framing fits in as the missing meta-layer:

Layer	What It Governs	Articles
Why we do this at all	The management problem, leaf nodes, verifiability	This piece
The architectural shape	When to use deterministic code vs ML vs LLM agents	Three-Tier AI Architecture
The discipline layer	Skills, progressive disclosure, anti-rationalization	The Skills Framework
The daily workflow	Context gathering, plan artifacts, worktrees	Agentic Development Patterns
The compounding mechanism	Overnight runs, parallel agents, review gates	The Overnight Agent Factory

Schluntz’s talk is the executive summary of the whole stack. It is the one I would hand to a CTO or a head of engineering who hasn’t followed this space closely and asks me why any of this matters for their team.

What I Am Taking Away

Three things I am going to operationalize immediately:

Leaf-node audits. On every codebase I touch, I want a rough dependency graph with leaves shaded. That map becomes the policy document for where my agents get maximum freedom and where they get a chaperone.
Verifiability-first design. When I scope a new feature I now explicitly ask “how will we verify this change without reading its code?” as a design constraint, not an afterthought. Stress tests, clean interfaces, invariants, and typed contracts all become load-bearing.
Claude’s PM is my job title. Not metaphorically. The 15 to 20 minutes of context work at the start of every agent session is the highest-leverage activity in my day. It is also the part I am most likely to short-change when I am in a hurry. I am going to stop doing that.

The Takeaway

The industry spent the first eighteen months of the agent era arguing about whether vibe coding was real engineering. It was the wrong debate. Karpathy’s original framing is a working practice with real limits. Schluntz’s extension - forget the code, not the product - is the bridge to production-grade work.

The hard part is not giving up control of the implementation. The hard part is building the abstractions and verification layers that let you hand control over responsibly. That is engineering management applied to your own craft. Every CTO, PM, and CEO in the world has been doing it forever. Now it is our turn.

If the length of tasks AI can do really does double every seven months, the engineers who still insist on reading every line are going to become the bottleneck in their own teams. The ones who learn to verify without reading, who vibe code the leaves and guard the trunks, who act as Claude’s PM rather than its typist, will build things the first group can’t imagine. That is the exponential in Karpathy’s embrace-exponentials, and it is the actual subject of Schluntz’s talk.

Watch it if you haven’t. It is sixteen minutes and it reframes the next two years.

References

Vibe Coding - Eric Schluntz, Anthropic Team - The original talk
Building Effective Agents - Schluntz and Zhang’s earlier primer on the agent patterns that underlie all of this
Andrej Karpathy on vibe coding - The original framing
The Last Bottleneck - My earlier piece on the exponential and the human bottleneck
Three-Tier AI Architecture - When to use deterministic code, ML, or LLM agents
The Skills Framework - Progressive disclosure and anti-rationalization as the discipline layer
Agentic Development Patterns - The daily workflow
The Perfect CLAUDE.md - Onboarding Claude like a new teammate
The Overnight Agent Factory - The compounding mechanism on top