Skip to content

Topic · A1

Autoresearch in Claude Code: The Complete Guide (2026)

Autoresearch is autonomous deep-research where Claude Code investigates a question for hours on its own. Here are the 4 production ports, how each works, and which to install.

# Autoresearch in Claude Code: The Complete Guide Autoresearch is the highest-leverage pattern most Claude Code users have never set up. The premise is simple: give Claude Code a research question, walk away, come back hours later to a structured report with sources. Under the hood, Claude plans subtasks, dispatches them (often via parallel subagents), keeps notes on disk, and revises its own work as it learns. The pattern originated with Andrej Karpathy's autoresearch repo — 80.7k stars for 630 lines of code, gospel-status for "what an AI dev should do with their evenings." This guide compares the 4 mature Claude Code ports, explains the pattern, and gives you copy-paste install for each. We've installed and tested all of them.

Why autoresearch exists

Modern coding agents are good at single tasks: read this file, write this function, fix this test. They are bad at long-horizon work — investigating a question that has no obvious decomposition, exploring branches that turn out to be dead ends, and synthesizing across dozens of sources. Autoresearch wraps the agent in a planning loop. Roughly:
  1. Plan — given a question, generate a tree of sub-questions.
  1. Dispatch — for each sub-question, either answer directly or spawn a subagent.
  1. Note — write intermediate findings to disk (markdown files in a working directory).
  1. Revise — re-read your own notes, identify gaps, generate new sub-questions.
  1. Report — when the tree is exhausted or the budget is hit, synthesize a final answer.
Each step is an LLM call. The "magic" is the persistence (Claude can pick up where it left off) and the parallelism (subagents work the sub-questions concurrently).

The 4 production ports compared

| Port | Stars | Best for | Subagent parallel? | Notes | |---|---:|---|---|---| | uditgoenka/autoresearch | 4.4k | General research, fast onboarding | Yes (via Task tool) | Cleanest port; closest to Karpathy's original. CLI-first. | | wanshuiyin/ARIS | 9.0k | Iterative research with self-critique | Yes | Adds an explicit "review" pass before each report. Slower but higher-quality. | | drivelineresearch/... | 2.1k | Technical/scientific topics | Yes | Tighter source-grading; better at academic content. | | Maleick/autoresearch-claude | 1.6k | Long-form reports (>10K words) | No | Linear pipeline. Simplest mental model. Best for write-ups. |
If you're picking one for the first time and don't have a specific use case in mind: start with uditgoenka/autoresearch.

Installing uditgoenka/autoresearch

``bash # In an existing Claude Code project cd ~/.claude/skills git clone https://github.com/uditgoenka/autoresearch.git cd autoresearch cp SKILL.md ~/.claude/skills/autoresearch/SKILL.md ` In your Claude Code session, autoresearch will auto-trigger when you ask for deep investigation. To force it explicitly: ` Use the autoresearch skill to investigate the question: "What are the production RAG patterns in 2026 and how do they compare?" ` Claude will: create a working directory, plan subtasks, dispatch subagents, write notes to disk, and return a structured report in roughly 20-60 minutes depending on depth.

Installing wanshuiyin/ARIS

ARIS adds an explicit "review and revise" loop. Each subagent's output is passed through a critic before being merged. The tradeoff: 2-3× slower, but significantly more reliable on contested questions.
`bash cd ~/.claude/skills git clone https://github.com/wanshuiyin/ARIS.git aris ` Configure the critic threshold in aris/config.yaml — the default is quality_threshold: 0.7, raise to 0.85 for academic-grade output, drop to 0.5 for casual research.

Where autoresearch fails

We were going to write a section about strengths. Here are the failure modes instead — they're what nobody else in the SERP tells you: 1. Closed-source data. Autoresearch lives on the web. If your question requires data behind a paywall, an internal docs system, or a private GitHub repo, the report quality collapses. Workarounds: use the
bash tool to clone private repos first, or pre-load context in CLAUDE.md. 2. Recency. Claude's training cutoff plus stale web snippets means autoresearch is 3-9 months behind on fast-moving topics. The pattern that fixes this is augmenting with a live-search MCP (see /topic/mcp-servers). 3. Tokens. A 4-hour autoresearch run on claude-opus-4-7 can cost $40-80 depending on subagent fanout. Set --budget 5.00 to cap. The claude-haiku-4-5` model is 1/10 the price and usable for most non-technical questions. 4. The "found a great source" rabbit hole. Claude sometimes anchors on one early source and ignores contradicting evidence. ARIS's review pass mitigates this; uditgoenka's port does not.

What to read next

Sources

  • wanshuiyin. ARIS. 9k stars.

Related GitHub projects

Frequently asked

What is autoresearch in Claude Code?
Autoresearch is an autonomous research loop where Claude Code investigates a topic for hours — searching the web, reading sources, taking notes, and producing a synthesized report — without supervision. The pattern originated from Andrej Karpathy's autoresearch repo (80.7k stars) and has been ported to Claude Code as skills, plugins, and standalone CLIs.
How is autoresearch different from a regular Claude Code conversation?
A regular Claude Code conversation is single-task and synchronous: you ask, it answers, you iterate. Autoresearch is multi-step and asynchronous: you give it a research question, it plans subtasks, dispatches them (often via subagents), writes intermediate notes to disk, and returns hours later with a structured report. The key is a planning loop plus persistence — Claude reasons about what to do next based on what it has already found.
Which autoresearch port should I install?
It depends on your use case: uditgoenka/autoresearch (4.4k stars) is the cleanest CLI port for general research; wanshuiyin/ARIS (9k stars) adds review-and-revise loops; drivelineresearch focuses on technical/scientific topics; Maleick targets long-form report generation. We cover the differences in detail below.
Does autoresearch work without Claude Code?
The original repo by Karpathy is provider-agnostic and works with the OpenAI API directly. The Claude Code ports add tighter integration with Claude's skills system, hooks, and subagent parallelization. If you're not on Claude Code, the original or one of the OpenAI-targeted forks is your path.

Related topics