Skip to content
Back to blog

Agentic coding patterns that actually work

/RuleSell Team

Three focused agents beat one generalist working three times as long. But only if you get the patterns right. Here's what works and what produces 10x bugs.

The promise of agentic coding is intoxicating: describe what you want, let AI agents build it in parallel, come back to a pull request. The reality is messier. Some patterns produce genuinely transformative output. Others produce code at impressive speed while making architectural errors that take days to untangle.

We've spent months evaluating AI coding assets for RuleSell — skills, agents, orchestration configs, AGENTS.md files. We've seen what works in real codebases and what generates impressive-looking diffs that fail on first review. Here's what we've learned, backed by research from people who've studied this systematically.

The single-agent ceiling

Before we talk about multi-agent patterns, let's be honest about single-agent limits. As Addy Osmani documented in his research on multi-agent coding, one agent hits three walls:

  1. Context overload. Large codebases exceed what fits in a context window. The agent starts hallucinating file paths, inventing APIs, or ignoring relevant code it can't see.
  2. Lack of specialization. A generalist agent writes worse code than a focused one. An agent told "build the API, database layer, and frontend" makes worse decisions than three agents each told to do one thing.
  3. No coordination. Even when you spawn helpers, without shared state or dependency tracking, they duplicate work, create conflicts, and produce inconsistent output.
The single-agent model works fine for small, well-scoped tasks: "add a validation function here," "write a test for this endpoint," "fix this TypeScript error." It breaks down for anything that requires understanding multiple files, making architectural decisions, or coordinating changes across a codebase.

Pattern 1: Subagents for decomposition (this is the baseline)

The simplest multi-agent pattern uses Claude Code's Task tool to spawn specialized child agents. You give each one a focused job, and they report back.

Osmani documented a concrete example: building "Link Shelf," an Express/SQLite app. The parent orchestrator decomposed work into three subagents — Data Layer, Business Logic, and API Routes. The first two ran in parallel while the third waited for dependency reports. Total cost: approximately 220k tokens, with no extra setup required.

This is the pattern you should start with. It requires no tooling, no configuration files, no orchestration framework. You're just asking Claude Code to delegate.

When it works: Tasks with clear boundaries and minimal interdependency. Each subagent owns a slice of the problem and produces a self-contained artifact. When it breaks: Tasks where agents need to coordinate on shared interfaces, or where one agent's architectural decision constrains another's implementation. Without explicit coordination, you get three components that each work in isolation and don't compose.

Pattern 2: Agent teams with shared task lists

The step up from raw subagents is Agent Teams — Claude Code's parallel execution model with coordination primitives. The architecture involves:

  • A Team Lead that decomposes work into tasks with explicit dependencies
  • A Shared Task List with dependency tracking and file locking
  • Multiple Teammates in isolated tmux panes (or worktrees), each claiming tasks
The key design decision: teammates self-claim tasks and message each other peer-to-peer. The lead doesn't become a bottleneck routing messages. When a task's dependencies complete, it auto-unlocks for the next available teammate.

The practical sweet spot is 3-5 teammates per team. Token costs scale linearly, but coordination overhead scales super-linearly. Beyond 5 agents, the task list management itself becomes a bottleneck.

When it works: Feature work with multiple independently testable components. Building a new module, implementing a multi-endpoint API, setting up a testing infrastructure. When it breaks: Work that requires real-time agreement on a shared abstraction. If the data model isn't locked before agents start building, they'll each invent their own version.

Pattern 3: Hierarchical subagents (teams of teams)

Rather than spawning six subagents from one orchestrator (fragmenting context), spawn two feature leads, each of which spawns their own specialists. This mirrors real organizational structure.

Orchestrator
├── Feature Lead: Auth System
│   ├── Specialist: Database schema + migrations
│   ├── Specialist: Auth middleware
│   └── Specialist: Auth tests
└── Feature Lead: Dashboard
    ├── Specialist: API endpoints
    ├── Specialist: React components
    └── Specialist: Integration tests

The hierarchy prevents context overload at the orchestrator level. Each feature lead knows its domain deeply. Specialists know their slice. The orchestrator only needs to understand the interface between features.

When it works: Large features that decompose into sub-features with their own internal complexity. Anything where "build this" naturally breaks into two or more independent sub-systems. When it breaks: When the features aren't actually independent. If Auth and Dashboard share data models, the feature leads need to coordinate — and they don't have a built-in mechanism for that.

Pattern 4: Git worktree isolation

This is the infrastructure that makes all the above patterns safe. Built-in worktree support in Claude Code gives each agent its own working directory and branch. No file conflicts during parallel work. Results merge when tasks complete.

The practical workflow:

  1. Main branch is the source of truth
  2. Each agent gets a worktree: git worktree add ../feature-auth -b feature/auth
  3. Agents work in parallel without touching each other's files
  4. When a task completes, its branch gets reviewed and merged
Steve Kinney's research confirms the practical limit: 3-5 parallel worktrees before context-switching overhead between terminals becomes its own problem. For larger parallelism, you need CI/CD integration or an orchestration tool like Worktrunk (a Rust CLI that wraps worktrees with a cleaner interface). When it works: Always. If you're running multiple agents on the same repo, use worktrees. Full stop. When it breaks: It doesn't break. The question is whether the overhead of managing worktrees is worth it for 2-agent scenarios. (Usually yes, because the cost of a merge conflict mid-session is high.)

What fails: the anti-patterns

Vague specifications multiplied by parallelism

Osmani puts this bluntly: "Vague thinking doesn't just slow you down — it multiplies" across parallel agents. If your spec says "build a user management system" without defining the data model, 3 agents will build 3 different user models. You've now tripled the cleanup work instead of tripling output.

The fix is boring: write a spec before spawning agents. Define the data model. Define the API contract. Define the file structure. Then parallelize implementation.

No quality gates

Without plan approval, hooks, and verification, small mistakes compound. Agents generate code at impressive speed while making architectural errors that a human reviewer would catch immediately.

Osmani's research quantified this: human-curated AGENTS.md files are worth more than machine-generated ones at any length. LLM-generated context files actually reduced task success by approximately 3% while increasing inference costs by 20%+. Let agents write code. Don't let agents write the specs that govern other agents.

The minimum viable quality setup:

  1. Plan approval: Agents write plans before coding. Leads review and approve/reject. This is cheaper than fixing bad implementations.
  2. Test hooks: A "TaskCompleted" hook that verifies tests pass before allowing an agent to mark a task done. Failed hooks trigger continued work.
  3. Dedicated reviewer: A permanent read-only reviewer (preferably Opus-class) on every task completion. Only leads see reviewer-approved code.

No file ownership rules

The rule is simple: one file, one agent. When two agents edit the same file, you get merge conflicts at best and silent corruption at worst. Worktrees don't save you here — they prevent git conflicts but not logical conflicts.

Define file ownership in the task decomposition. If two tasks need to modify the same file, they should be sequential, not parallel.

Stuck agent loops

Agents can get stuck: hitting the same error, retrying the same approach, burning tokens without progress. The recommended kill criterion: reassign after 3 failed iterations. If an agent can't solve a problem in 3 tries, a different agent with a fresh context often can.

The bottleneck has shifted

Here's the insight that changed how we think about agentic coding:

"The bottleneck is no longer generation. It's verification."

Agents produce output fast. Knowing whether that output is correct is the hard problem. Human review isn't overhead — it's the safety system. Before AI agents, human writing speed created natural pain feedback loops. You'd notice a bad pattern because it hurt to type it out. Agent armies remove that friction, allowing small mistakes to compound silently until catastrophic failure.

This is why discipline matters more in agentic workflows than in traditional coding:

  • Specs must be precise (your spec is the leverage when coordinating parallel agents)
  • File ownership must be strict
  • Kill criteria must prevent stuck agents from burning budget
  • Quality gates must exist at every merge point
The developers who get the most leverage from these tools aren't the ones who spawn the most agents. They're the ones who write the best specs — because strong engineers write better specs, and the spec quality determines agent output quality.

The tool landscape

Osmani organized tools into three tiers:

TierToolsUse case
Tier 1: In-processClaude Code subagents, Agent Teams2-5 agents, single terminal, no extra tooling
Tier 2: Local orchestratorsConductor (Mac), Vibe Kanban (cross-platform)3-10 agents, visual dashboards, known codebases
Tier 3: Cloud asyncClaude Code Web, GitHub Copilot Agent, Jules, Codex WebAssign and close your laptop, return to PRs
Start at Tier 1. Move up only when you hit its limits. Most tasks don't need more than 3 subagents in a single Claude Code session.

What RuleSell has to do with this

Every pattern described above depends on configuration: AGENTS.md files, skill definitions, hook configurations, orchestration rules. These are the artifacts that determine whether multi-agent coding produces 10x output or 10x bugs.

On RuleSell, these artifacts are published, quality-scored, and verified. Our Quality Score measures the things that matter for agentic workflows: trigger reliability (does the skill fire when needed?), token efficiency (does the agent waste context?), install success (does it work first try?), and security (does it do what it claims and nothing more?).

If you're building agentic workflows, don't start from scratch. Browse Claude Code skills and MCP servers that are already built and tested, or explore the full catalog. If you've built orchestration patterns that work, publish them — there's a community that needs them, and we'll make sure the quality bar stays high.

Want to build your own? Start with our complete skill-building guide, or read about the anti-patterns we reject so you don't waste a submission.