Topic · A2
The 'Pick 3 MCP Servers' Rule (and Why 3 Is the Magic Number)
A single MCP server burns 14,214 tokens before user input. Stack 5 of them and you're past 66,000 — a third of Claude Sonnet 4.5's context window. Here's the math and the fix.
# The "Pick 3 MCP Servers" Rule
If your Claude Code agent feels slow, the first thing to check is how many MCP servers you have active. The second thing is whether any of them are paying their token rent.
Here are the published measurements. A single MCP server like mcp-omnisearch consumes 14,214 tokens at session start, before user input (eclipsesource). Stack several active servers and the math gets ugly fast — the same source documents extreme cases hitting 66,000+ tokens of schema before a conversation begins. That's a third of Claude Sonnet 4.5's 200,000-token context window, gone before you typed a character.
Three servers. Maybe five if you must. Eight is where the wheels come off.
The math
The MCP spec pre-loads everything. When your client connects to a server, it pulls down the fulltools/list, resources/list, and prompts/list — each tool's name, full description, input schema, and (for resources) URI templates. The LLM needs all of this in context to decide which tool to call.
A typical tool descriptor with a verbose description, an input schema, and a few examples runs 200–400 tokens. A server with 40 tools (Slack, Notion, large vendor servers) easily hits 8,000–15,000 tokens of schema overhead. Multiply by five active servers and you're at 40,000–75,000 tokens before user input.
For comparison, a typical work session uses 3,000–10,000 tokens of actual conversation + code context per turn. The schema is the iceberg under the waterline.
Why three
Three is empirical, not magic. It's the number we've seen working developers converge on after the honeymoon phase wears off.- One vendor server you use daily. GitHub if you commit code, Linear if you ship tickets, Notion if you write docs. The one you'd notice if it disappeared.
- One filesystem or browser server. Filesystem (Anthropic reference) for "let the agent read my repo." Playwright (Microsoft official) for "let the agent test my UI."
- One search or fetch server. Brave Search for general web. Exa if you're doing research-heavy work.
How to audit your own setup
Three commands. Five minutes.Claude Code
``bash
claude mcp list
claude /context
`
mcp list shows which servers are active. /context shows the token breakdown — including MCP schema cost as a line item. Anything that doesn't fit "one of the three slots above" is a candidate for claude mcp remove.
Cursor
Cursor's MCP panel (Settings → MCP) shows the active server list. Cursor doesn't surface per-server token cost as of writing — file an issue if you'd like it.
Hand-measure with MCP Inspector
For any server you can't audit through your client:
- Run the MCP Inspector (official tool).
- Connect to the server you're auditing.
- Look at the
tools/list response.
- Run it through
tiktoken with the Sonnet tokenizer. Roughly 4 chars per token in English.
We've done this for a dozen popular servers. Light ones (Sequential Thinking, Time) land at 300–800 tokens. Medium ones (filesystem, GitHub) at 2,000–4,000. Heavy ones (Slack with full bot scopes, Notion with all 18 tools) at 8,000–15,000+.
When 3 isn't enough — the alternatives
Two routes if you genuinely need more breadth than three servers' tools provide.
Composio's Tool Router
Composio ships a "Tool Router" — single MCP endpoint, dynamic tool registration. Your client sees one server. Composio's runtime resolves which underlying tools to expose per turn. Token cost goes way down. Trade-off: a hop through Composio's hosted infrastructure, plus their pricing. For a team that genuinely needs 30+ tools, often worth it. For a solo dev shipping a side project, often overkill.
Project-scoped server lists
Both Claude Code and Cursor support per-project MCP configs. The pattern that works: keep two global servers (GitHub, Brave Search), let each project add one or two more (Postgres while debugging a query, Stripe while building checkout). When the project ships, the project-scoped servers go with it.
A different kind of fix: project-scoped agents
Anthropic's documentation on sub-agents hints at this without saying it out loud. A "research agent" that has Brave Search, Fetch, and Sequential Thinking. A "code agent" that has GitHub, Filesystem, and Playwright. A "data agent" that has Postgres-read-only and Notion. Each agent has three servers. The parent session can call any agent without paying its schema cost upfront.
This is partly a context-budget play and partly a security play — narrower agents apply the lethal-trifecta test cleanly (see /topic/mcp-security). Claude Code's sub-agents and Cursor's mode-switching are both viable substrates for it. The work to set it up is real (one-time), and the payoff is permanent.
Just don't install the heavy ones
The Slack official MCP server is excellent if you use Slack daily. If you check Slack once a week, it's costing you 8,000 tokens every session for two tool calls a week. Some servers shouldn't be in your default stack — they should be a claude mcp add you run when you need them, and claude mcp remove you run when you're done.
Where this fails
Two real failure modes.
1. The "3 servers" rule doesn't fit every workflow. A security-research agent that needs 5 different recon tools, a data engineer touching 6 different databases, a content team running 4 different CMS integrations — for these, lazy loading or a tool router is the answer, not a hard count. The number-of-servers rule is a heuristic, not a law.
2. Token cost isn't the only metric. A server that costs 1,500 tokens but ships ambiguous tool names that confuse the LLM is worse than a 5,000-token server with crisp naming. Quality of tool descriptions matters more than count. We sketched this in /topic/mcp-security under tool poisoning — ambiguity isn't always malice, but it's always cost.
What to read next
- /topic/mcp-servers — what they are and how to install one.
- /topic/best-mcp-servers-2026 — the 18 we'd trust.
- /topic/mcp-security — the 66% findings and the lethal trifecta.
- /topic/paid-mcp-servers — the new category that's cleaner on tool count by design.
- /for/claude-code — Claude Code's MCP UX is the most mature for auditing.
- /for/cursor — Cursor's per-tool permission model is more verbose but more granular.
Sources
- Eclipsesource. "MCP Context Overload", January 2026 — 14,214-token and 66,000-token figures.
- Anthropic / claude-code. Issue #20421: lazy MCP loading — user-driven feature request.
- Scott Spence. "Optimising MCP server context usage in Claude Code" — practical guidance from a working developer.
- Lakshmi Narasimhan. "Your MCP Servers Are Eating Your Context" — frequently cited in HN threads.
- HN. "MCP is a fad" thread, October 2025 — multiple developers reporting quality drop with too many servers.
- Composio. Tool Router product page — runtime dynamic tool loading.
- MCP Inspector. Official debugging tool — hand-measurement workflow.
- Anthropic. Claude Code MCP docs —
claude mcp` CLI surface.Related GitHub projects
Frequently asked
- How many MCP servers should I install?
- Three, in most setups. The published measurements show one MCP server like mcp-omnisearch consumes 14,214 tokens at session start (eclipsesource.com). Multi-server installs commonly burn 66,000+ tokens before the user types — that's roughly a third of Claude Sonnet 4.5's 200,000-token window. Three servers fit comfortably; five is the working ceiling; eight is the 'why is my agent slow' tax.
- Why is my Claude Code or Cursor agent slow?
- Most likely answer: too many MCP servers active simultaneously. Each server's tool schemas, resource catalogs, and prompts get pre-loaded into the context window. The slowness compounds — more tokens means more attention compute per turn means more wall-clock latency. Scott Spence's 'optimising MCP server context usage in Claude Code' walks through the diagnosis.
- What is lazy schema loading and why doesn't MCP have it?
- Lazy schema loading would mean MCP servers register their existence with the client but their tool schemas only load when a tool is invoked. The closest thing in 2026 is Composio's 'tool router' approach (one endpoint, dynamic tool registration). The native fix is a [feature request in claude-code issue #20421](https://github.com/anthropics/claude-code/issues/20421) that hasn't shipped. Until then, the protocol pre-loads everything.
- How do I measure how many tokens each MCP server costs me?
- In Claude Code, `claude mcp list` and `claude /context` reveal the breakdown. Cursor's MCP panel shows tool counts but not token costs as of writing. For first-principles measurement, use the MCP Inspector to fetch a server's `tools/list` response and count tokens with `tiktoken` (Sonnet ≈ 4 chars/token in English). A server with 40 tools and a verbose schema commonly hits 12,000+ tokens; a server with 3 lean tools hits 1,500.
- Can I use a tool router instead of installing servers directly?
- Yes — Composio markets the 'Tool Router' for exactly this. A single MCP endpoint, dynamic tool registration, only the tools the agent currently needs get loaded. The trade-off: an extra hop through Composio's infra, plus their pricing model. For a working solo dev, three handpicked servers often beats one router on simplicity and on cost.
- Does pruning MCP servers improve agent quality, not just speed?
- Yes, and the effect is real. The HN thread 'MCP is a fad' has multiple developers reporting that agents with 8+ servers make worse tool choices than agents with 3 — the LLM gets overwhelmed by candidate tools. This is 'tool overload' as a quality problem, not just a token-cost problem. Anthropic's own best-practices guidance warns against 'mega-skill' anti-patterns; the same logic applies to MCP server stacking.