What is agentic engineering?

Agentic engineering is the discipline of building software with autonomous coding agents (Claude Code, Codex CLI, Aider, Cursor's background agent) as the primary writer, with a human providing direction, constraints, and review. It replaces vibe coding — where the human types fast and lets the agent fill in — with a structured process: brainstorm, plan, implement, verify, iterate. Karpathy framed the shift at Sequoia 2026 as 'vibe coding raised the floor; agentic engineering raises the ceiling.'

How is agentic engineering different from vibe coding?

Vibe coding is informal — describe what you want, let the agent produce, accept what looks right. Agentic engineering is structured — write a CLAUDE.md or AGENTS.md, run autoresearch or Spec Kit for planning, dispatch subagents for parallel work, require tests before merge. The difference is repeatability and quality control. Vibe coding works for prototypes; agentic engineering ships systems.

Did Karpathy actually say vibe coding is dead?

He used the word 'passé' at Sequoia AI Ascent in April 2026. The phrase carried by the audience was 'vibe coding raised the floor; agentic engineering raises the ceiling' (paraphrased by Stephanie Zhan moderating). Karpathy did not call vibe coding bad — he called it superseded for serious work. The exact quote and tweet IDs are in the citations below.

What tools does agentic engineering require?

At minimum: a coding agent with a planning loop (Claude Code, Codex CLI, or Aider), a structured config file (CLAUDE.md or AGENTS.md), and a verification step (tests, type-checking, or manual review). Most teams add: a skills system (Claude Code skills or AGENTS.md folders), subagent orchestration for parallel tasks, and autoresearch for long-horizon investigation. The methodology frameworks (Superpowers, Spec Kit) are optional but help.

Is agentic engineering just rebranded prompt engineering?

No. Prompt engineering is about crafting a single prompt to get a single good output. Agentic engineering is about designing the agent's working environment — its rules, its tools, its memory, its checkpoints — so the agent produces good outputs across hundreds of turns without supervision. Context engineering (Anthropic's term) is closer, but agentic engineering is broader: it includes process and verification, not just context curation.

Agentic Engineering: The Post-Vibe-Coding Paradigm (2026)

# Agentic Engineering: The Post-Vibe-Coding Paradigm In February 2025, Andrej Karpathy tweeted about "vibe coding" — "you fully give in to the vibes" and let the LLM produce. The tweet became the cultural framing for an entire year of AI-assisted development. Bolt, Lovable, Replit Agent, v0 — the whole "describe-it-and-ship-it" wave built on the vibe-coding premise. In April 2026 at Sequoia AI Ascent, Karpathy called vibe coding "passé." The phrase the audience carried home was "vibe coding raised the floor; agentic engineering raises the ceiling." Simon Willison wrote up the transition on his weblog. The New Stack ran a "vibe coding is passé" piece. The discourse moved. The SERPs haven't. As of May 2026, most "what is vibe coding 2026" results still describe vibe coding as the dominant paradigm. The gap between the discourse and the search index is roughly six weeks. This page exists because that gap is a window for ranking, but more importantly because the shift is real and most teams have not adjusted their workflow to match.

What changed

Three things happened between February 2025 and April 2026 that made vibe coding insufficient:

Agent quality crossed a threshold. Claude Sonnet 4.5 in October 2025 and Claude Opus 4.7 in January 2026 were the first models that could hold multi-hour-coding tasks coherently. Codex CLI matched it. Once agents could work independently for hours, the "just describe what you want" interface stopped scaling — too many decisions happen during that time without human input.

The disasters arrived. The widely-cited "1.5M API keys leaked from vibe-coded apps" story (cluster A6, security narrative anchor) was the load-bearing example. Veracode's 2025 report: 45% of AI-generated code introduces security vulnerabilities. When the agents were dumb, vibe coding produced toys. When the agents got smart, vibe coding produced shipping-but-broken systems.

The methodology frameworks shipped. Superpowers (Jesse Vincent, September 2025) formalized "brainstorm → plan → implement" and "subagent-driven development." Spec Kit (GitHub, September 2025) formalized "specify → plan → tasks → implement." Karpathy's autoresearch (March 2026) formalized autonomous metric-driven iteration. The vocabulary stopped being "I described, the agent built" and started being "I directed, the agent executed, I verified."

Agentic engineering is the umbrella term for the discipline that replaced vibe coding.

The four-layer stack

A modern agentic engineering setup has four layers. Each one is a place where the team encodes discipline. | Layer | Purpose | Tools | |---|---|---| | Direction | What we are building and why | CLAUDE.md, AGENTS.md, spec.md, brainstorm sessions | | Planning | How we will build it | Plan mode (Claude Code shift+tab), Spec Kit phases, Superpowers brainstorm-plan-implement | | Execution | The agent writing code | Claude Code, Codex CLI, Aider, subagents via the Task tool | | Verification | Did we build the right thing | Tests, type checks, autoresearch loops, autonomous critique passes | The vibe-coding version of this stack is a single layer — execution. You type a prompt, the agent writes code. The agentic-engineering version is all four layers, with explicit transitions between them. The fastest improvement most teams can make is adding a verification layer. Veracode's number — 45% of AI-generated code introduces vulnerabilities — is a verification problem, not a model problem. Tests, type-checking, security review, autoresearch's autonomous critique pass — any of these dramatically reduce the ship-rate of bad code.

The Karpathy framing

The most-quoted phrase out of Sequoia 2026:

"Vibe coding raised the floor. Agentic engineering raises the ceiling." — Andrej Karpathy, Sequoia AI Ascent, April 2026 (paraphrased by moderator Stephanie Zhan)

The framing matters. Karpathy is not arguing vibe coding was bad. He is arguing it was a phase — the phase where non-engineers could ship a working app and existing engineers could move 2-3x faster on trivial work. The floor went up. Agentic engineering is the next phase — where the agent does the keyboard work and the engineer does the judgment work. The ceiling goes up. The skills that compound are direction (knowing what to build), context engineering (loading the agent with the right material), and verification (knowing when the agent went off-track). Typing speed compounds nothing. Uncle Bob's response from the same period: "Vibe coding is not the same as disciplined agentic development." That is the same point, said by an older voice. The methodology side of the industry was waiting for the agent quality to justify it.

Where this differs from prompt engineering

Prompt engineering optimized a single inference. Agentic engineering optimizes a multi-turn working environment. The two are related — better prompts still help — but the scope is different. | Discipline | Optimizes | Time horizon | Typical artifact | |---|---|---|---| | Prompt engineering | A single LLM call | One turn | A 500-token prompt | | Context engineering | The context window | One conversation | CLAUDE.md, retrieval pipelines | | Agentic engineering | The agent's full working environment | Hours to weeks | CLAUDE.md + skills + subagents + tests + verification loops | Anthropic's engineering blog frames context engineering as "the art and science of curating what will go into the limited context window." That is one piece of agentic engineering, not the whole. The piece prompt engineering tutorials underweight: what happens when the agent is wrong. A vibe-coded session terminates when the user notices a bug; an agentic engineering setup has explicit checkpoints (tests, type checks, autoresearch verification passes) that catch the bug before the user has to.

The tooling map

The vocabulary moved fast. Here is what each piece does in the May 2026 agentic engineering stack: Direction layer:

CLAUDE.md — Claude Code's project-level rule file. Always-loaded into context. HumanLayer's standard: under 60 lines.

AGENTS.md — The tool-agnostic equivalent. Read by Codex, Cursor, Copilot, Gemini, and increasingly Claude Code (via symlink). Adopters include 32+ tools as of early 2026. Whether any foundation formally stewards the spec is unconfirmed at the time of writing.

SKILL.md — Anthropic's on-demand context format. Skills load only when triggered by their description, which keeps the always-on context small.

Planning layer:

Plan mode (Claude Code shift+tab) — Forces the agent to plan before editing. The single highest-EV setting most users miss.

Spec Kit (GitHub, September 2025) — Four-phase "specify → plan → tasks → implement" discipline. Heavier than plan mode; right for projects that need formal specs.

Superpowers (Jesse Vincent, September 2025) — Methodology framework: brainstorm → plan → implement, subagent-driven development, TDD integration.

Execution layer:

Claude Code, Codex CLI, Aider, OpenCode, Cline — the terminal agents. Each has trade-offs covered in cluster A5.

Subagents via the Task tool — for parallel independent work. See /topic/subagents.

Verification layer:

autoresearch (Karpathy March 2026, 4 mature Claude Code ports) — autonomous metric-driven iteration with a self-critique pass. See /topic/autoresearch.

TDD harnesses — turning the test suite into the verification target.

The pieces compose. A typical agentic engineering workflow in May 2026 looks like: CLAUDE.md sets the rules, plan mode produces a plan, the main agent executes with skill-triggered tools, subagents handle parallel branches, autoresearch runs the verification loop overnight. Five years ago that workflow needed a team of engineers; today one engineer steers it.

Where this fails

1. Small projects do not need this much process. A 50-line script is a vibe-coding task. Forcing CLAUDE.md + plan mode + subagents + verification on a one-evening hack is overhead theater. Match the methodology to the project. 2. Models without long-horizon coherence break the stack. Claude Opus 4.7 and Sonnet 4.5 hold multi-hour tasks. Older models (and most non-Anthropic competitors as of May 2026) lose coherence past 30-60 minutes. If your agent cannot work alone for an hour, the verification layer becomes the bottleneck — too many checkpoints to be productive. 3. The "Karpathy said it so it must be right" trap. Karpathy's framing is influential. It is not gospel. Some serious engineers (Uncle Bob, Simon Willison) still recommend more conservative workflows than the full agentic stack implies. Read multiple voices. 4. Verification gaps. The 45% Veracode number is for current-state code. A naive "agentic engineering" setup that skips actual security review still produces vulnerable code. The methodology is necessary but not sufficient. 5. The 1.5M API keys story is a warning, not a feature. Vibe-coded apps leaked keys because they shipped with hardcoded credentials. The agentic engineering equivalent — a CLAUDE.md that says "never commit secrets" and a CI hook that runs truffleHog — exists and is not optional.

How to upgrade your workflow

If you are currently vibe coding and want to move to agentic engineering in 30 days:

Week 1: Write a CLAUDE.md. Aim for under 60 lines. Include: the project goal, the architecture in one paragraph, the rules the agent must follow, the verification commands (npm test, npx tsc --noEmit). HumanLayer publishes templates worth copying.

Week 2: Adopt plan mode. Use shift+tab on every non-trivial task. Read the plan before approving. The agent will catch bad ideas you would have shipped.

Week 3: Add one verification step. A pre-commit hook, a CI test job, an autoresearch verification pass — pick one. The first one cuts shipped-bug rate by ~30% in our experience.

Week 4: Try subagents for parallel work. Pick a task with 2-3 independent pieces. Dispatch them via the Task tool. Compare wall-clock to sequential.

That is the minimum. The fancier moves (skills, subagent-driven development, autoresearch loops) come later.

Sources

Karpathy, Andrej. "Vibe coding" tweet, February 3 2025. The origin of the vibe-coding framing.

Karpathy, Andrej. Sequoia AI Ascent fireside chat, April 2026. The "vibe coding is passé" moment; paraphrased "vibe coding raised the floor, agentic engineering raises the ceiling" by moderator Stephanie Zhan.

Willison, Simon. "Vibe coding and agentic engineering". Practitioner synthesis of the transition.

Anthropic. "Context engineering" engineering blog post. Source for the context-rot framing.

Veracode. 2025 State of Software Security report. The 45%-of-AI-generated-code-has-vulnerabilities finding.

Hacker News. "Vibe coding: What is it good for?" (id 46064998). The dissent thread.

Hacker News. "Uber spent its 2026 AI budget in 4 months on Claude Code" (id 47976415). Commercial proof of the agentic-engineering tooling adoption.

GitHub Blog. Spec Kit announcement, September 2025. The "specify → plan → tasks → implement" framework.

Agentic Engineering: The Post-Vibe-Coding Paradigm (2026)

What changed

The four-layer stack

The Karpathy framing

Where this differs from prompt engineering

The tooling map

Where this fails

How to upgrade your workflow

What to read next

Sources

Related GitHub projects

Frequently asked

Related topics