Skip to content
Back to Tech
GenAI · 13 min read

The Skills Framework - From Vibe Coding to Production-Grade Agentic Engineering

Why Anthropic's Agent Skills and Addy Osmani's skills framework are the missing discipline layer for serious AI-assisted software engineering - and how they compare to GitHub's Spec Kit.

Share
On this page
Agent-Skills Development Lifecycle - six phases from Define to Ship, each with its slash command

For the past year I’ve been refining my agentic development stack: Claude Code with a meticulously crafted CLAUDE.md, parallel worktrees, overnight agent runs, Cursor for visual work. It’s been enormously productive - but it has also exposed a recurring failure mode.

As CLAUDE.md grows to encode everything the agent should know, it starts to bloat. Testing conventions, security checklists, shipping procedures, review standards - stuffing all of it into the system prompt means every single session pays the token cost, whether the agent is refactoring a CSS variable or designing a new billing pipeline. And once the file crosses a certain size, the agent’s adherence to it starts to degrade.

The skills framework is the fix. It’s the discipline layer I didn’t know I was missing, and after spending the last few weeks integrating it into my workflow, I’m convinced it’s the next important standard for anyone doing serious agentic work. In this article I want to walk through what it is, why it matters, how it compares to GitHub’s Spec Kit, and how I’ve slotted it into my own setup.

What Are Agent Skills?

Anthropic introduced Agent Skills in late 2025 as a structured way to package procedural knowledge for Claude. The core idea is disarmingly simple: instead of cramming everything into one giant prompt, you create small, self-contained skill directories that the agent loads only when relevant.

A skill is just a folder containing a SKILL.md file with YAML frontmatter and a body, plus optional bundled resources:

my-skill/
├── SKILL.md           # The core instructions + frontmatter
├── scripts/           # Executable helpers the agent can run
├── references/        # Deeper docs loaded on demand
└── assets/            # Templates, example files, images

The SKILL.md frontmatter is what makes the whole system work:

---
name: spec-driven-development
description: Creates specs before coding. Use when starting a new project, feature, or significant change and no specification exists yet.
---

Claude sees the name and description of every installed skill at session start - but nothing else. Only when the agent decides a skill is relevant does the full body load into context. That single design choice - progressive disclosure - is what makes the framework scale.

Progressive Disclosure: The Quiet Breakthrough

The progressive disclosure model works in three layers:

  1. Metadata (always loaded) - Just the name and description fields, a few tokens per skill. This is Claude’s “table of contents.”
  2. Instructions (loaded on trigger) - The full body of SKILL.md, typically kept under 500 lines. Loads when the agent matches the task to the skill’s description.
  3. Resources (loaded on demand) - Files under scripts/, references/, and assets/. Can be arbitrarily large because they only enter context when the agent explicitly reads them.

This solves the oldest tension in agent engineering: you want the agent to have access to everything, but you don’t want to pay for everything on every turn. Progressive disclosure says: have the manual in the room, but don’t read it cover-to-cover for every question.

It’s also why skills are quietly nibbling at MCP’s lunch. In practice, connecting more than two or three MCP servers to an agent degrades tool-use accuracy noticeably - every tool description sits in context permanently. Skills, by contrast, let you install dozens of them with near-zero startup cost. MCP and skills aren’t competitors exactly - MCP is the plumbing for tool access, skills are the procedural brain on top of that plumbing - but for knowledge encapsulation, skills win on ergonomics.

Addy Osmani’s Agent-Skills: Production-Grade Engineering in a Box

This is where it gets really interesting for me. Addy Osmani - a Senior Engineering Director at Google who led Chrome’s Developer Experience team for over a decade, authored Learning JavaScript Design Patterns, and has become one of the most thoughtful voices on AI-assisted development - has packaged his own opinionated, production-grade engineering workflows into an open-source skills library at github.com/addyosmani/agent-skills.

The tagline tells you everything: “Production-grade engineering skills for AI coding agents.” These aren’t reference docs - they’re workflows agents follow, encoding the kind of discipline a senior engineer brings to production code.

The repo currently ships around twenty skills mapped to six lifecycle phases:

PhaseRepresentative Skills
Definespec-driven-development, idea-refine
Planplanning-and-task-breakdown
Buildincremental-implementation, test-driven-development, frontend-ui-engineering
Verifydebugging-and-error-recovery, test execution gates
Reviewcode-review-and-quality, security-and-hardening, api-and-interface-design
Shipshipping-and-launch, ci-cd-and-automation, performance-optimization

There are corresponding slash commands - /agent-skills:spec, /agent-skills:plan, /agent-skills:build, /agent-skills:test, /agent-skills:review, /agent-skills:ship - that give you direct access to the relevant skill at each stage.

The SKILL.md Anatomy That Makes It Work

What I find most valuable is the consistent internal structure of each skill. Using spec-driven-development as the reference model:

  • Overview - One paragraph: what this skill achieves and why the agent should care.
  • When to Use - Explicit triggers and, just as importantly, explicit anti-triggers (“Don’t use this for single-line fixes or unambiguous changes”).
  • Process - A gated, phased workflow. Spec-driven development walks through Specify → Plan → Tasks → Implement, with a human review checkpoint between each phase.
  • Rationalizations - My favorite section. This is a list of the excuses an agent (or a tired engineer) will invent to skip the process, each paired with a firm rebuttal. The spec-driven skill’s canonical line is “A 15-minute spec prevents hours of rework. Waterfall in 15 minutes beats debugging in 15 hours.”
  • Red Flags - Warning signs that the skill is being applied wrong or skipped entirely.
  • Verification - Concrete evidence the skill was actually followed. Not “seems right” - a checklist.

That Rationalizations section is the key innovation. Every senior engineer has seen a junior developer (or themselves on a bad day) rationalize away testing, skip a review, or handwave past a security concern. Skills encode not just the right process but the anti-rationalization defense that keeps the agent honest. That’s the difference between a doc and a workflow.

How It Compares to GitHub’s Spec Kit

If you’ve been watching this space you’ve probably heard of Spec Kit, GitHub’s toolkit for Spec-Driven Development. On the surface there’s a lot of overlap - both are about replacing “vibe coding” with structured, specification-first engineering. But they’re actually complementary rather than competing.

DimensionSpec Kit (GitHub)Agent-Skills (Osmani)
Primary focusSpec-Driven Development lifecycle scaffoldingFull engineering workflow library (spec, test, review, ship)
Delivery formatSlash-command templates + project scaffoldingProgressive-disclosure SKILL.md directories
Process coverageSpecify → Plan → Tasks → ImplementSix phases across the full SDLC
Context loadingPrompt files injected upfrontLoaded on demand via metadata matching
Tooling surfaceCLI (specify) that sets up a repoPlain directories, portable across agents
Anti-rationalizationImplicit in the gated flowExplicit section in every skill
ExtensibilityTemplate-basedWrite more SKILL.md files, drop them in

Interestingly, recent versions of Spec Kit can now install agent skills instead of slash-command prompt files via an --ai-skills parameter - so the two are converging. Spec Kit gives you a batteries-included on-ramp for the spec-plan-tasks-implement flow; agent-skills gives you a broader and deeper library for the rest of the engineering lifecycle, loaded progressively. I use Spec Kit to bootstrap new projects and the agent-skills library as the ongoing discipline layer that sits on top.

Why This Matters: The Vibe-Coding-to-Engineering Gap

Andrej Karpathy coined vibe coding to describe a specific, honest practice: prompt the model, accept what it spits out, don’t read the diff, iterate by pasting errors back in. For weekend hacks and throwaway prototypes, it’s a legitimate superpower. I’ve built things in two hours that would have taken two days of careful engineering.

The problem is that “vibe coding” became a suitcase word. People started using it to describe everything from true YOLO prompting to disciplined agentic workflows with tests, code review, and human architectural oversight. Simon Willison tried to reclaim the territory with “vibe engineering” - same enthusiasm, but with engineering rigor bolted on. It didn’t stick, mostly because the word vibe carries too much casual baggage. As Karpathy himself later suggested, agentic engineering is the cleaner frame: the human owns the architecture, quality, and correctness; the agent owns the implementation.

Osmani’s own framing is the one I find most useful: the single biggest differentiator between agentic engineering and vibe coding is testing. With a solid test suite, an agent can iterate in a loop until green and you get high confidence in the result. Without one, you’re just flying blind at 10x speed.

The skills framework is what makes agentic engineering enforceable. It’s the structural layer that turns intent (“we should write specs first”) into process (“the /spec skill literally blocks implementation until Phase 1 is approved”). It also makes the process portable: the same SKILL.md files work in Claude Code, Cursor, Gemini CLI, Windsurf, and Copilot because they’re just markdown with a convention.

The Concrete Advantages I’ve Measured

After integrating the skills framework into my stack over the last few weeks, here’s what’s genuinely changed:

1. My CLAUDE.md Got Smaller, Not Bigger

This was the surprise. I’d been fighting the bloat problem by being more disciplined about what went into CLAUDE.md. Skills let me move whole chapters out: the security checklist, the release procedure, the performance-profiling playbook, the API design rubric. They’re all still there, still enforced - but they only enter context when a skill is triggered. My CLAUDE.md is now mostly about the project’s architecture and tenancy model, not generic engineering discipline. The agent is noticeably more focused as a result.

2. Anti-Rationalization Actually Works

This was the biggest qualitative shift. Before skills, when an agent wanted to skip writing tests, it would rationalize: “This is a trivial change, tests would be overkill.” Sometimes I’d catch it, sometimes I wouldn’t. With the test-driven-development skill loaded, those rationalizations get met head-on by the skill’s own rebuttal section. The agent argues with itself and loses. I’ve seen overnight runs where the agent considered skipping a test, the skill intercepted the thought, and the run produced tests anyway.

3. Overnight Runs Are More Trustworthy

My overnight agent factory workflow is only as good as the implicit quality gates the agent respects while I’m asleep. Skills make those gates explicit. /ship won’t terminate unless security-and-hardening and code-review-and-quality have verifiably run. I wake up to PRs that are closer to mergeable, not closer to “needs another hour of cleanup.”

4. Portable Across Agents

When I’m iterating on a frontend in Cursor and then switch to Claude Code for a backend task, the same skills library applies. Same conventions, same verification gates, same anti-rationalizations. No more agent-specific prompt tuning.

5. Composable With Spec Kit

For greenfield projects I bootstrap with Spec Kit’s specify CLI, then let the agent-skills library take over once the first spec is approved. The two frameworks slot together cleanly.

How I’m Using It Day-to-Day

Important: Skills are part of Claude Code’s plugin system - you can’t just clone a repo into a random directory and expect Claude to find it. The plugin system handles discovery, registration, and the slash-command wiring.

Installation

Inside a Claude Code session, run these two commands:

# 1. Register Osmani's repo as a plugin marketplace
/plugin marketplace add addyosmani/agent-skills

# 2. Install the plugin from that marketplace
/plugin install agent-skills@addy-agent-skills

Restart Claude Code after installing - skills and commands are discovered at session start, so they won’t appear until you open a fresh session.

Troubleshooting: If the marketplace clone fails silently (the commands never appear), the most common cause is SSH permissions. Fix it with:

git config --global url."https://github.com/".insteadOf "git@github.com:"

Then retry the two commands above in a new session.

For the full plugin and skill specification, see Anthropic’s skills documentation.

Using the Commands

Once installed, the plugin registers seven namespaced slash commands that map to the development lifecycle. The agent-skills: prefix distinguishes them from built-in Claude Code commands (like the built-in /review which reviews pull requests):

/agent-skills:spec           → write a structured specification (saves to SPEC.md)
/agent-skills:plan           → break the spec into small, verifiable tasks
/agent-skills:build          → implement the next task incrementally (thin vertical slices)
/agent-skills:test           → TDD workflow; for bugs, uses the "Prove-It" pattern (failing test first)
/agent-skills:review         → five-axis code review (correctness, readability, architecture, security, performance)
/agent-skills:code-simplify  → simplify code for clarity without changing behavior
/agent-skills:ship           → pre-launch checklist (quality, security, performance, accessibility, infra, docs)

Behind these commands sit 21 skills covering the full SDLC - from idea-refine and spec-driven-development through debugging-and-error-recovery to shipping-and-launch. Skills also trigger automatically: Claude sees their descriptions at session start and loads the full instructions when a task matches. You don’t always need the slash command; starting work on a new feature will naturally invoke the spec and planning skills.

After installation, slim down your CLAUDE.md - move the generic engineering discipline sections out, keeping only project-specific architecture. The skills now carry that weight.

For internal skills specific to my projects - multi-tenancy enforcement, NutriSpan data model conventions, SortFlex CadQuery patterns - I’m writing my own SKILL.md files using Osmani’s structure as a template. That six-section anatomy (Overview / When to Use / Process / Rationalizations / Red Flags / Verification) is the right shape even for domain-specific skills.

The Takeaway

If you’re still running agentic workflows with a single bloated CLAUDE.md, you’re leaving a lot of leverage on the table. The skills framework isn’t a fashion - it’s an honest engineering answer to a real scaling problem: how do you give an agent more discipline without paying the context cost of loading all of it on every turn?

Anthropic’s progressive-disclosure design solved the loading problem. Addy Osmani’s open-source skills library gives you twenty battle-tested workflows to get started with. GitHub’s Spec Kit gives you the on-ramp. Put the three together and you have something that finally feels like engineering rather than prompting.

For anyone building serious software with agents - not prototypes, not weekend hacks, but the things that need to work in production a year from now - this is the framework I’d adopt today.

References

agent-skills claude-code vibe-engineering agentic-development spec-kit progressive-disclosure engineering-standards

Related articles