Topic · A8
LLM Gateway Decision Tree: OpenRouter, Vercel AI Gateway, Portkey, LiteLLM
Four LLM gateways dominate production usage in 2026. Each optimizes for a different problem and has a different pricing posture. This is the decision tree — when each one wins, what BYOK actually costs, and what to migrate when you outgrow your first choice.
By May 2026, every production AI application sits behind some kind of gateway. Direct calls from your app to api.anthropic.com work for prototypes; they don't work for production. You need failover when a provider's region goes down, you need cost tracking, you need rate-limit shaping, you need observability. The question is which gateway.
Four dominate the production market: OpenRouter, Vercel AI Gateway, Portkey, LiteLLM. Cloudflare AI Gateway and Helicone exist but slot in differently — Cloudflare is more "edge AI cache" than gateway, Helicone is more "observability that also routes." This page is the decision tree across the main four, plus when to migrate as you scale.
The contenders
OpenRouter
openrouter.ai — the OG model marketplace. One billing relationship; 300+ models including OpenAI, Anthropic, Google, plus open-source models hosted by various inference providers. Pioneered the "send your request to one endpoint, choose any model by name" pattern. Pricing: 5.5% markup on provider cost. Free tier available with rate limits (the documented 50 RPD for free models, more with $10+ credit on account). Best for: experimentation across many providers, including open-source models that aren't easy to access directly. The single-billing-relationship is convenient for solo developers and small teams. Where it loses: the 5.5% markup is meaningful at scale. The free-tier rate limits are tight and the documentation page literally uses Jinja template placeholders ({FREE_MODEL_RATE_LIMIT_RPM}) where the numbers should be — verified by direct WebFetch in May 2026. Their own docs are broken for the free-tier limits.
Vercel AI Gateway
vercel.com/docs/ai-gateway — Vercel's gateway, deeply integrated with their hosting platform. Pricing: $0 markup with BYOK (bring your own provider key). Vercel-managed pricing without BYOK is provider rate + Vercel's margin. The BYOK path is what most teams use. Best for: teams on Vercel hosting. The integration with Vercel functions, edge runtime, and streaming is best-in-class. Provider failover is configured per-project. The gateway can serve from edge for low latency on the routing layer. Where it loses: if you're not on Vercel, the gateway's main benefits (Fluid Compute, edge routing) don't apply. The 504 timeout issue on long streaming requests is real — search "vercel ai gateway 504" and you'll see the pain. Mitigations exist (Fluid Compute, increased timeout) but they're plumbing.Portkey
portkey.ai — gateway with strong governance and observability features. Per-team budgets, audit logs, guardrails (PII redaction, output filtering, prompt-injection detection), virtual API keys with scoped permissions. Pricing: per-seat enterprise pricing. Free tier exists but is constrained. Best for: mid-to-large companies where the governance features (per-team budgets, audit, compliance) are required. Companies in regulated industries. Teams with internal-platform engineering that wants centralized LLM access for multiple product teams. Where it loses: the price floor is higher than alternatives. For solo developers or small teams that don't need the governance features, you're paying for capability you don't use.LiteLLM
BerriAI/litellm — open-source gateway. Self-hosted (Docker, K8s) or available as a managed cloud service. 100+ models supported. Cost tracking, budget alerts, virtual keys, fallback routing — all the gateway primitives. Pricing: software is free. Self-host cost is the infrastructure you run it on (~$50-200/month for a small team). Managed cloud option exists but most teams self-host. Best for: EU companies with data residency requirements (run LiteLLM on Hetzner / OVH / your own VPC). High-volume teams where SaaS gateway per-seat fees add up. Teams with platform engineering willing to operate the proxy. Where it loses: ops burden. You're running another service. SaaS gateways pay for themselves below ~500k requests/month for most teams; LiteLLM self-hosted pays off above that.The decision tree
Start at the top, follow the matches:
Are you on Vercel hosting?- Yes → Vercel AI Gateway (BYOK). The integration is too convenient to skip. Watch for 504s on streaming.
- No → continue.
- Yes → OpenRouter. The 5.5% is the convenience tax for single-billing.
- No → continue.
- Yes → Portkey. The price floor is justified by the governance.
- No → continue.
- Yes → LiteLLM self-hosted.
- No → OpenRouter for simplicity, or stay direct with a single provider's SDK if you only need one provider.
The BYOK story
"Bring your own key" — meaning the gateway uses your direct provider API key instead of its own — is the cost difference that matters most.
- OpenRouter: does NOT support pure BYOK. You pay 5.5% on provider cost regardless.
- Vercel AI Gateway: supports BYOK at $0 markup. Your Vercel project holds your Anthropic / OpenAI / Google keys.
- Portkey: supports BYOK at $0 model markup (you pay Portkey's per-seat / per-request fees separately).
- LiteLLM: BYOK is the default model. You bring keys; LiteLLM proxies.
The exception: open-source models. If you want to use Llama 3.3 70B or Qwen 2.5 in production through a gateway, OpenRouter's hosted inference providers are how you get them. There's no BYOK equivalent for "the open-source model hosted somewhere managed."
Migration triggers
When to switch from one gateway to another:
From OpenRouter to Vercel AIGW / LiteLLM (BYOK): when your monthly OpenRouter bill makes the 5.5% markup material. Roughly $100/month is the threshold where the migration effort pays back inside three months. From any SaaS gateway to LiteLLM self-hosted: when EU data residency becomes a requirement, when your monthly gateway SaaS bill exceeds ~$500, or when you need custom routing logic the SaaS gateway doesn't expose. From any gateway to Portkey: when compliance or governance becomes a board-level concern. Usually triggered by an audit request or a procurement requirement. From Vercel AIGW to LiteLLM: when you leave Vercel hosting, or when the 504 timeout issues on streaming become unmanageable.The pricing math, worked
Three workload sizes:
Small workload: 100k requests/month, $500 provider spend| Gateway | Markup | Per-seat | Monthly cost | Notes |
|---|---|---|---|---|
| OpenRouter | 5.5% | $0 | $27.50 | Cheapest setup |
| Vercel AIGW (BYOK) | 0% | $0 | $0 | Only on Vercel |
| Portkey | 0% (BYOK) | ~$50/seat | $50-100 | Overkill |
| LiteLLM self-host | 0% | n/a | ~$50 ops | Ops-heavy for this size |
| Gateway | Markup | Per-seat | Monthly cost | Notes |
|---|---|---|---|---|
| OpenRouter | 5.5% | $0 | $275 | Markup adds up |
| Vercel AIGW (BYOK) | 0% | $0 | $0 | On Vercel only |
| Portkey | 0% | ~$50/seat × 5 | $250 | Reasonable |
| LiteLLM self-host | 0% | n/a | ~$100 ops | Best raw economics |
| Gateway | Markup | Per-seat | Monthly cost | Notes |
|---|---|---|---|---|
| OpenRouter | 5.5% | $0 | $2,750 | Painful |
| Vercel AIGW (BYOK) | 0% | $0 | $0 | On Vercel only |
| Portkey | 0% | $50/seat × 10 | $500 | Cheap relative to spend |
| LiteLLM self-host | 0% | n/a | ~$200 ops | Cheapest economically |
Beyond the four
Two more worth knowing:
Cloudflare AI Gateway — cache-heavy AI proxy that sits at Cloudflare's edge. Good for caching repeated prompts (Q&A bots over docs) and for analytics. Not a full gateway in the OpenRouter sense — not every provider supported, not first-class for streaming agentic work. Helicone — primarily an observability tool with proxy routing. Best if cost tracking and tracing are your top need; less feature-rich on routing and failover. See /topic/llm-cost-tracking.Where this fails
Gateway != silver bullet. A gateway routes; it doesn't make slow providers fast or expensive providers cheap. Your underlying provider mix is the cost driver. The gateway is the visibility and resilience layer. Streaming latency adds up. Every gateway adds 20-100ms of latency. For interactive UI streaming, that's noticeable. Direct provider calls are faster; the gateway's benefits (failover, tracking) are post-hoc — most users won't notice the gateway when it's working. Provider-specific features may not pass through. Anthropic prompt caching, OpenAI structured outputs, Gemini safety settings — some gateways pass these through faithfully, some don't. Test before assuming. Pricing is volatile. OpenRouter's 5.5% has been stable since 2024 but could change. Vercel AIGW's BYOK terms are explicit but Vercel could change them. Re-check pricing pages quarterly. LiteLLM's update cadence is fast. Self-hosted LiteLLM updates models and providers frequently; if you pin a version and don't update, you'll fall behind on new model releases. Allocate maintenance time.What to read next
- /topic/llm-cost-tracking — the cost dashboard the gateway feeds
- /topic/llm-evals — eval pipeline complements gateway
- /topic/anthropic-prompt-caching — provider-specific feature that needs gateway pass-through
- /topic/openai-responses-api-migration — the parallel SDK-level migration
- /topic/ai-sdk-v5-migration — Vercel AI SDK migration
- /blog/opus-4-7-tokenizer-tax — why gateway-level visibility catches issues invoice-level doesn't
- /for/finops-ai-cost-tracker — the team that owns gateway decisions
Sources
- OpenRouter. Pricing. 5.5% markup.
- OpenRouter. Rate limits documentation. Note: page contains literal Jinja placeholders for the free-tier limits as of May 2026 — verified by direct fetch.
- Vercel. AI Gateway docs. BYOK pricing.
- Portkey. portkey.ai. Governance features.
- BerriAI. LiteLLM repository. 100+ models, self-host model.
- TrueFoundry. "LLM Gateway Comparison". Companion analysis.
- Helicone. Buyer guide. Observability angle.
- HN 47301395. "Ask HN: How are you monitoring AI agents in production?". Tools cited.
- Anthropic. Pricing page. Provider-side rates.
- Pinggy. "LLM Gateway Comparison 2026". Third-party comparison (vendor-tinted).
Frequently asked
- What is an LLM gateway?
- A proxy layer between your application and LLM providers. Handles routing across providers (fallback to GPT if Claude is down), cost tracking, rate limiting, caching, and observability. Examples: OpenRouter, Vercel AI Gateway, Portkey, LiteLLM, Cloudflare AI Gateway. Different from an SDK (the Anthropic / OpenAI / Vercel AI SDK is what you call into your code; the gateway is what your SDK talks to).
- Vercel AI Gateway vs OpenRouter — which is cheaper?
- Vercel AI Gateway with BYOK (bring your own key) charges $0 markup. OpenRouter charges 5.5% on top of provider cost regardless of BYOK. For high-volume teams that already pay provider invoices, Vercel AIGW BYOK is strictly cheaper. For teams that want OpenRouter's model breadth (300+ models including open-source through one billing relationship), the 5.5% is the cost of consolidation.
- Should I self-host LiteLLM?
- Yes if any of: (1) you're EU and want data residency, (2) you process more than 500k requests/month and a self-hosted proxy starts saving money on the SaaS gateway's per-seat fees, (3) you need custom routing logic the SaaS gateways don't expose. Otherwise SaaS (Vercel AIGW, Portkey, OpenRouter, Helicone) gets you running faster.
- Does Portkey replace the Vercel AI SDK?
- No. Portkey is a gateway; the Vercel AI SDK is an SDK. They compose — the AI SDK can call Portkey as its base URL. Portkey adds governance features (per-team budgets, audit logs, guardrails) on top of whatever SDK you use.
- Why does Vercel AI Gateway sometimes 504?
- Vercel AI Gateway sits between your app and the provider; long-running streaming requests can hit Vercel's function timeout limits before the provider finishes. Mitigations: (1) increase function timeout on your Vercel project, (2) use Fluid Compute (no timeout on streaming), (3) for very long requests, call the provider directly and lose the gateway features. Truefoundry's blog comments and Vercel forum posts both surface this regularly.
- Can I use multiple gateways at once?
- Yes but rarely worth it. The pattern: Vercel AIGW for streaming UI traffic where its Vercel-native features help, LiteLLM self-hosted for batch/agent workloads where you want full control. Most teams pick one and stick. Double-gateway adds latency and a debugging surface that's not worth the marginal benefit.