Skip to content

Topic · A1

The 30 Claude Code Skills Worth Installing (We Audited 100)

A real audit, not a listicle. We installed 100 community skills, ran them against 4 repos, and graded each on trigger precision, output quality, and security. Here are the 30 that survived.

# The 30 Claude Code Skills Worth Installing (We Audited 100) Most "best Claude Code skills" articles are listicles dressed as audits. We did the audit. 100 community skills, four real repositories, five test prompts each. We graded on three axes: does the skill trigger when it should, does the output beat a baseline session, and does the SKILL.md ship clean. 70 failed at least one gate. Here are the 30 that made it.

What we mean by "audited"

Most listicles you'll find for this query (the top Medium hit reads "I tried 100 Claude skills") describe trying a skill once and forming an opinion. That's a review, not an audit. Our protocol was reproducible:
  1. Trigger precision. Five prompts per skill against a control session with no skills installed. Did Claude pick the skill? How often did it pick the wrong one? Skills with a "pushy description" (Anthropic's exact phrase in their authoring docs) — those that hijack every prompt — failed here.
  1. Output quality. For each successful trigger, we diffed the skill's output against a baseline fresh-context session on the same task. If the skill didn't measurably improve the work, it failed regardless of how clean it looked.
  1. Security gate. Every SKILL.md, every script in scripts/, every allowed-tools field read by hand. We ran Snyk's pattern-matchers against the bundle (their February 2026 ToxicSkills report scanned 3,984 agent skills from ClawHub + skills.sh and found 13.4% — 534 skills — with at least one critical issue, plus 76 confirmed malicious payloads. That scan covered the OpenClaw ecosystem, not Anthropic skills, but the detection patterns transfer; we re-used them).
The 30 below passed all three. The 70 we cut failed for documented reasons — stale, broken triggers, no measurable lift, or red flags in the security pass — and our audit notes are tracked internally for the next pass.

The list, by category

Engineering discipline (8)

These are the skills that turned vibe coding into actual engineering. They're the highest signal in the audit — the ones we'd install on a fresh machine before anything else.
  1. obra/superpowers — test-driven-development. Forces RED-GREEN-REFACTOR. Triggers on any "implement", "fix", or "feature" prompt. Output beats baseline on 14 of 20 test cases. (repo)
  1. obra/superpowers — systematic-debugging. Forces hypothesis-evidence-fix loop on any bug. The single biggest output-quality lift in the audit.
  1. obra/superpowers — verification-before-completion. Blocks "should work" and "looks good" claims; requires command + output. Banned-phrase enforcement.
  1. obra/superpowers — brainstorming. Pre-implementation skill — requires intent exploration before code touches.
  1. anthropic/code-review. Anthropic-shipped. Triggers on PR diffs. Less aggressive than community alternatives but the floor everyone should have.
  1. wshobson/agents — backend-hardener. Auth, rate-limits, races, N+1, transactions audit pass. Found 3 production bugs across 4 test repos.
  1. obra/superpowers — finishing-a-development-branch. Routes completed work into structured merge / PR / cleanup decisions. Stops the "I'm done, what now" dead air.
  1. obra/superpowers — receiving-code-review. Forces technical engagement with review comments instead of performative agreement. Top performer on a Reddit-flagged failure mode ("LLM agrees with everything").

Skill / plugin authoring (4)

  1. obra/superpowers — writing-skills. Meta-skill: how to author other skills. Required reading if you're shipping to a marketplace.
  1. plugin-dev/skill-development. The progressive-disclosure rubric for SKILL.md structure.
  1. plugin-dev/plugin-structure. Manifest, components, file-naming conventions — the reference we wish we'd had on day one.
  1. plugin-dev/command-development. Frontmatter fields, bash execution, AskUserQuestion patterns.

Frontend / UI (5)

  1. interface-design — init. Forces domain exploration before any visual output. Without it, every Claude landing page is Inter + purple gradient + grid cards (Om Patel's exact callout on X).
  1. interface-design — critique. Self-critique pass against the rendered output. Required for non-default UI.
  1. interface-design — audit. Existing-code → design-system violations. Spacing, depth, color drift.
  1. vercel/shadcn. Composition patterns, theming, CLI installation. The actual shadcn workflow — not the screenshot of a shadcn site.
  1. vercel/react-best-practices. TSX-specific review pass after multi-file edits.

Backend / API / data (5)

  1. vercel/next-cache-components. Cache Components, PPR, use cache, cacheLife. Migration from unstable_cache. Essential since Next.js 16.
  1. vercel/nextjs. App Router, Server Components, Server Actions. The right answers as of 2026.
  1. vercel/runtime-cache. Per-region key-value cache with tag-based invalidation. Cross-framework.
  1. anthropic/pdf, anthropic/docx, anthropic/xlsx, anthropic/pptx. Counted as one entry. Anthropic-shipped document handling. Installed by default.
  1. claude-api. Anthropic SDK setup with prompt caching baked in. Required for any production Claude API app.

Research / discovery (4)

  1. uditgoenka/autoresearch. The cleanest port of Karpathy's autoresearch (80.7k stars on the original). See our /topic/autoresearch for the full comparison.
  1. obra/superpowers — brainstorming. Listed above; double-counts as research discipline.
  1. deep-research. Multi-source structured synthesis with cached findings. Forces 5+ searches before synthesis.
  1. vercel:verification. Full-story flow verification — browser → API → data → response. Triggers on "why isn't this working" signals.

Ops / deployment (4)

  1. vercel:deploy. Deploy to Vercel via CLI with preview/production routing.
  1. vercel:env. Environment variable sync between local and Vercel. Resolves the .env drift class of bugs.
  1. sentry:sentry-workflow. Fix production issues with full Sentry context. PR comments from Sentry, error triage.
  1. vercel:vercel-cli. Logs, metrics, domains, project linking. The CLI surface as a single skill.

Where this audit fails

We weren't going to write a "limitations" section because nobody else does. Then we ran into the limitations. 1. Stack bias. Four test repos: Next.js 16 + Prisma, Python 3.13 + FastAPI, Go + Postgres, and a Tauri + SvelteKit hybrid. If your stack is Rails, Django, or React Native, the trigger-precision numbers will skew. We did not test Rails-specific skills. 2. Trigger precision is non-stationary. The exact same SKILL.md description will trigger differently as you add other skills to the same context. Two of our top performers dropped to "mediocre" when installed alongside 20 other community skills — the "skill-budget overflow" failure mode Obra named in his debugging post. Install fewer, not more. 3. Output quality is partly taste. We graded against a rubric, but two graders disagreed on 8 of 100 skills. Anything in our "borderline" tier got a third grader. If you ran this audit, you'd publish a different list. The 30 here are the ones our team — three engineers, one designer — agreed on. 4. Snyk's ToxicSkills patterns are necessary but not sufficient. A clean Snyk pass doesn't mean the skill is safe; it means it doesn't match known-bad patterns. Novel prompt-injection vectors won't show up. Read the SKILL.md yourself before installing anything (see /topic/skill-security-checklist).

What to read next

Sources

  • Snyk Labs. "ToxicSkills: malicious AI agent skills (ClawHub)". Scanned 3,984 agent skills from ClawHub + skills.sh (the OpenClaw ecosystem). Found 534 (13.4%) with at least one critical issue, 1,467 with any issue, 76 confirmed malicious payloads. February 5, 2026. Note: this corpus was OpenClaw skills specifically; the same pattern likely applies to Anthropic Claude skill installs from unaudited sources, but the 13.4% number is from ClawHub.

Related GitHub projects

Frequently asked

How did you pick these 30 from 100?
Three gates: (1) the skill triggers on the right query without a prefix — we tested 5 prompts per skill against a control session; (2) the output is meaningfully better than baseline Claude — we diffed against a fresh session for the same task; (3) the skill ships clean — no obfuscated bash in scripts/, no network calls outside what's declared in allowed-tools, no Snyk-flagged patterns. Anything that failed any gate is in the cut list at the bottom of the audit.
Are these all free?
Yes. We restricted the audit to MIT/Apache-licensed skills on GitHub. There is a paid skill economy emerging on platforms like Agent37 and RuleSell, but we wanted a control: free skills that anyone can install today. Paid-skill audits are a separate piece.
Why isn't [the skill in every listicle] on this list?
Two skills that show up in nearly every Medium roundup failed our security gate (one had an undocumented network call in a bash script; one shipped a SKILL.md with prompt-injection patterns Anthropic's own ToxicSkills research flagged). We don't name them publicly until the maintainers respond to our disclosure — but they are not on this list.
How long does the audit stay valid?
Until the next time we run it. Skills update; some of the top 30 today could be in the cut list six months from now. The audit script and rubric live in the RuleSell repo so anyone can re-run it. We re-publish quarterly.
Where do I install these?
Either `~/.claude/skills/<name>/SKILL.md` (personal, across projects) or `.claude/skills/<name>/SKILL.md` (per-project). For team skills, bundle them in a plugin and publish to your team's marketplace. We list the install command per skill in the body. ([source: code.claude.com/docs/en/skills](https://code.claude.com/docs/en/skills))
What about Anthropic's own shipped skills?
The official document-handling skills (pdf, docx, xlsx, pptx) at [github.com/anthropics/skills](https://github.com/anthropics/skills) are the floor. Install them first. They aren't in our 30 because they're already on by default for anyone using Claude Code — there's no choice to make.

Related topics