Portkey vs LiteLLM vs OpenRouter vs Helicone: LLM Gateway Comparison 2026

Portkey vs LiteLLM vs OpenRouter vs Helicone: Which LLM Gateway Should You Actually Use in 2026?

🇨🇳
阅读中文版：LLM 网关怎么选：Portkey vs LiteLLM vs OpenRouter vs Helicone，2026 谁更适合你的 AI 应用？

TL;DR

Portkey is the best all-in-one solution for production teams that need routing, caching, and observability under one roof. LiteLLM wins if you want full control and zero vendor lock-in. OpenRouter is the fastest path to testing 400+ models with a single API key. Helicone is a specialist — great at cost tracking and observability, but it won’t route or cache your requests.

My quick decision framework:

Building an MVP or testing models? → OpenRouter
Developer on a budget who doesn’t mind ops work? → LiteLLM (self-hosted)
Production app with real users and real money at stake? → Portkey
Already have routing but need cost visibility? → Helicone

Now let me break down why.

Why You Need an LLM Gateway (and Why Direct API Calls Won’t Cut It)

If you’re still calling OpenAI or Anthropic directly from your application code, you’re going to hit a wall. I’ve been there. Here’s what happens:

Costs spiral. Without centralized monitoring, you have no idea which features are burning through tokens. I once discovered 30% of our API spend was duplicate requests that could have been cached.

Reliability tanks. OpenAI goes down, Claude rate-limits you, and your users see a 500 error. No fallback, no retry logic, just silence.

Switching models is painful. Want to test whether Claude Opus handles your use case better than GPT-4o? That means code changes, config changes, redeployment. For what should be a five-minute experiment.

You’re flying blind. What’s your p95 latency? Which user segment costs the most? What’s your error rate by provider? Without data, you can’t optimize anything.

An LLM gateway solves all four problems: unified API interface, intelligent routing with fallbacks, caching, observability, and budget controls. The question isn’t whether you need one — it’s which one fits your situation.

The Four Contenders at a Glance

Dimension	Portkey	LiteLLM	OpenRouter	Helicone
Core identity	Enterprise AI Gateway	Open-source SDK + Proxy	Model aggregation marketplace	Observability platform
Deployment	SaaS + self-hosted	Self-hosted + cloud	SaaS only	SaaS + open-source
Models supported	1,600+ (via gateway)	100+	400+	Works with any provider
Routing	Advanced (load balancing, fallback, A/B, conditional)	Basic (fallback, retry)	None	None
Caching	Semantic + exact match	Exact match only	None	None
Observability	Full (logs, traces, cost analytics)	Basic (logs, Prometheus)	Basic (usage stats)	Deep (cost analysis, user tracking, sessions)
Pricing	Free tier → $49/mo Production → Custom Enterprise	Free (open-source) / ~$250/mo Enterprise	No platform fee; 5.5% credit purchase fee	Free tier → $20/seat/mo Pro → Custom Enterprise
Best for	Production teams, mid-to-large orgs	Budget-conscious devs, self-hosters	Rapid prototyping, model shoppers	Cost optimization, analytics teams

That table should already eliminate some options. But the real differences show up when you dig into how each tool handles specific scenarios.

Portkey: The Production-Grade Powerhouse

What it is: Portkey positions itself as the “control panel for production AI.” It’s a full-featured gateway that handles routing, caching, guardrails, observability, and compliance — all in one platform.

Who it’s for:

Teams whose AI app generates revenue (downtime = lost money)
Organizations that need SOC 2, GDPR, or HIPAA compliance
Anyone managing multiple LLM providers with complex routing logic
Teams that want one tool instead of stitching three together

Where it shines:

Routing is best-in-class. You can route based on latency, cost, success rate, or custom conditions. Set up “try GPT-4o first, fall back to Claude 3.5 Sonnet if latency exceeds 3s, then try Gemini as last resort” — all through config, no code changes. A/B testing between models is built in.

Semantic caching saves serious money. Beyond exact-match caching, Portkey recognizes semantically similar requests. “Summarize this article” and “Give me a brief overview of this content” can hit the same cache entry. Teams report 30-50% higher cache hit rates compared to exact-match-only solutions.

Observability that actually helps. Full request tracing, per-user cost breakdowns, per-feature analytics, integration with Datadog and Grafana. You can answer “which feature costs the most” and “which users are power-consumers” in seconds.

Enterprise security is real. SSO, RBAC, audit logs, SOC 2 Type II, GDPR compliance. If you’re in healthcare or fintech, this isn’t optional — it’s table stakes.

Where it falls short:

The free tier is limited. 10,000 logs/month with 3-day retention. Good for evaluation, not for running anything real.
Learning curve is steep. More features means more complexity. Budget time for configuration and onboarding.
Vendor dependency. Some advanced features (semantic caching, guardrails) rely on Portkey’s infrastructure even in self-hosted mode.

Pricing (as of June 2026):

Developer (Free): 10K logs/month, 3-day retention
Production: $49/month — 100K logs/month, 30-day retention, $9 per additional 100K requests
Enterprise: Custom pricing — SSO, VPC hosting, dedicated support, custom retention

My take: If your AI app has paying customers and your monthly LLM spend exceeds $1,000, Portkey pays for itself quickly. The combination of semantic caching + intelligent routing can cut your actual model costs by 20-40%. But if you’re pre-revenue or a solo developer, it’s overkill. Start cheaper and migrate when scale demands it.

LiteLLM: Maximum Control, Minimum Cost

What it is: LiteLLM is an open-source Python library and proxy server that lets you call any LLM using the OpenAI SDK format. It’s not trying to be an enterprise platform — it’s trying to be the best compatibility layer and routing proxy you can self-host for free.

Who it’s for:

Developers who want full control over their infrastructure
Teams that need to support niche or self-hosted models (Llama, Mistral, Cohere)
Python shops that want direct library integration without an extra HTTP hop
Budget-constrained startups that can handle their own ops

Where it shines:

Compatibility is unmatched. 100+ model providers, including every major API and local deployments. Change one line of config to switch providers. If you’re running a local Llama instance alongside Claude and GPT-4o, LiteLLM handles all three with the same interface.

Truly free and open. MIT license, full source on GitHub (15K+ stars), no feature gating, no usage limits. Fork it, modify it, deploy it however you want.

Python-native integration. If your backend is Python, you can use LiteLLM as a library — no proxy layer needed. That eliminates the extra network hop and keeps latency minimal.

Active community. Issues get responses fast, PRs get merged regularly, docs stay current. When something breaks, you’re not alone.

Where it falls short:

Observability is DIY. You get Prometheus metrics and log output, but no dashboard. You’ll need to set up Grafana or connect to an external tool (like Helicone) yourself.
Routing is basic. Fallback and retry? Yes. Load balancing, A/B testing, latency-based routing? No. If you need sophisticated traffic management, you’ll outgrow LiteLLM’s built-in capabilities.
No semantic caching. Exact-match only. For applications with lots of paraphrased-but-similar queries, you’ll miss 30-50% of potential cache hits.
Ops burden is real. Self-hosting means you own uptime, monitoring, backups, and security patching. If your team doesn’t have DevOps capacity, this becomes a hidden cost.

Pricing:

Open-source (self-hosted): Free forever, all features
Enterprise Basic: ~$250/month (adds SSO, premium support)
Enterprise Premium: ~$30,000/year (adds dedicated infrastructure, SLA)

My take: LiteLLM is the right choice for technically capable teams that value control over convenience. The open-source version is genuinely full-featured — you’re not getting a crippled free tier. But be honest about the ops cost. If you’d spend 10 hours/month maintaining your LiteLLM deployment, that’s worth more than a $49/month SaaS subscription. Pair it with Helicone for observability and you get 80% of Portkey’s functionality for a fraction of the cost.

OpenRouter: The Model Marketplace

What it is: OpenRouter isn’t a gateway in the traditional sense. It’s a model aggregation service — one API key, 400+ models, unified billing. Think of it as a supermarket for LLMs rather than infrastructure for managing them.

Who it’s for:

Developers who want to test multiple models without managing accounts everywhere
Apps that let users choose their own model (ChatGPT alternatives, AI playgrounds)
Anyone who wants zero-config access to the latest models on day one
Rapid prototyping where speed-to-first-call matters more than optimization

Where it shines:

Model breadth is unbeatable. 400+ models including OpenAI, Anthropic, Google, Mistral, Qwen, DeepSeek, open-source fine-tunes. When a new model drops, OpenRouter usually has it within hours.

Transparent pricing model. OpenRouter passes through provider pricing at cost — no per-token markup. They charge a flat 5.5% fee when you purchase credits ($0.80 minimum). You know exactly what you’re paying.

Zero setup friction. Sign up, get an API key, start calling models. No deployment, no configuration, no infrastructure. For prototyping, nothing is faster.

Unified billing simplifies accounting. One invoice instead of five. One credit card on file instead of juggling OpenAI, Anthropic, and Google billing separately.

Where it falls short:

No routing intelligence. OpenRouter won’t fallback, load-balance, or A/B test for you. If GPT-4o is down, your request fails unless you build retry logic yourself.
No caching whatsoever. Every request hits the model provider. High-repetition workloads (chatbots with common questions, customer support) will cost significantly more than with a caching gateway.
Observability is minimal. Basic usage stats only. No per-request latency tracking, no error analysis, no user-level cost breakdown.
Single point of failure. No self-hosted option. If OpenRouter has an outage, you’re down. For mission-critical applications, that’s a real risk.
The 5.5% adds up at scale. On $10,000/month in model costs, you’re paying $550/month just for the aggregation convenience. Over a year, that’s $6,600 — enough to fund a proper gateway.

Pricing:

No subscription fee
5.5% credit purchase fee ($0.80 minimum per transaction; 5.0% for crypto)
Model pricing passed through at provider rates

My take: OpenRouter is perfect for the “try everything” phase. When you’re evaluating whether Claude or GPT-4o or Gemini handles your use case best, OpenRouter lets you test all three in an afternoon. But once you’ve picked your models and your monthly spend passes $1,000, migrate to something with caching and routing. The convenience tax becomes significant at scale.

Helicone: The Observability Specialist

What it is: Helicone is an LLM observability platform. Its job is to help you see what’s happening — cost breakdowns, user behavior, request traces, performance metrics. It doesn’t route requests or cache responses. It watches and reports.

Who it’s for:

Teams whose LLM costs are growing faster than revenue
Product managers who need user-level analytics on AI features
Anyone who wants granular cost attribution (by user, feature, model, time period)
Teams that already have routing (via LiteLLM or custom code) but lack visibility

Where it shines:

Cost analysis is the best in class. Break down spend by user, feature, model, session, time period. Set budget alerts. Answer “why did our costs spike 40% this week?” in minutes instead of hours.

User tracking is powerful. Tag requests with user_id, session_id, feature name — then slice and dice. “80% of costs come from 15% of users” is the kind of insight that changes your pricing strategy.

Integration is dead simple. Add a few headers to your existing API calls. No SDK swap, no code restructuring, no proxy deployment. If you’re already calling OpenAI, you can add Helicone monitoring in under 10 minutes.

Open-source + SaaS flexibility. Self-host for free if you want full data control, or use the managed service starting at $20/seat/month. Lower barrier than Portkey, more polished than DIY Grafana dashboards.

Where it falls short:

No routing at all. Helicone observes — it doesn’t control. No fallback, no load balancing, no model switching. You need another tool for that.
No caching. Helicone can tell you that 30% of your requests are duplicates, but it won’t help you avoid paying for them. You still need a caching layer elsewhere.
Model coverage is narrower. Works well with OpenAI, Anthropic, Google, and other major providers. Less proven with niche or self-hosted models.
Open-source version lacks some features. Team collaboration, SSO, and some advanced analytics are SaaS-only.

Pricing:

Free tier: 10K requests/month
Pro: $20/seat/month (unlimited requests, 3-month retention, team features)
Enterprise: Custom pricing (SSO, dedicated support, custom retention)

My take: Helicone isn’t competing with Portkey or LiteLLM — it’s complementing them. The strongest setup I’ve seen for cost-conscious teams is LiteLLM (routing + fallback) paired with Helicone (observability + cost tracking). You get routing for free and world-class analytics for $20/month. If you already use Portkey, you probably don’t need Helicone — Portkey’s built-in observability covers most of the same ground.

Feature Comparison: The Details That Matter

Feature	Portkey	LiteLLM	OpenRouter	Helicone
Automatic fallback	✅ Multi-level	✅ Basic	❌	❌
Load balancing	✅ Weighted, latency-based	❌	❌	❌
A/B testing	✅ Built-in	❌	❌	❌
Semantic caching	✅	❌	❌	❌
Exact-match caching	✅	✅	❌	❌
Cost tracking	✅ Detailed	✅ Basic	✅ Basic	✅ Best-in-class
User-level analytics	✅	❌	❌	✅
Guardrails / content filters	✅ 50+ built-in	❌	❌	❌
Self-hosted option	✅ (gateway OSS)	✅ Full	❌	✅
OpenAI SDK compatible	✅	✅	✅	N/A (proxy headers)
SOC 2 / GDPR	✅	❌	❌	✅ (SaaS)
Rate limiting / budgets	✅	✅	❌	❌
Added latency	~20-50ms	~5-15ms (proxy) / ~0 (SDK)	~30-80ms	~5ms (header-based)

Pricing Comparison (Realistic Monthly Scenarios)

Solo developer, 50K requests/month:

Tool	Monthly cost
LiteLLM (self-hosted)	$0 (+ your server costs)
Helicone (free tier)	$0 (up to 10K, then $20/mo)
OpenRouter	$0 platform fee (5.5% on credit purchases)
Portkey (Production)	$49/mo

Startup team of 5, 500K requests/month:

Tool	Monthly cost
LiteLLM (self-hosted)	$0 + ~$50-100 server costs
LiteLLM + Helicone combo	~$100/mo (Helicone Pro for 5 seats)
Portkey (Production)	$49 + ~$36 overage = ~$85/mo
OpenRouter	5.5% of spend (on $5K spend = $275)

Mid-size company, 5M requests/month, $20K model spend:

Tool	Monthly cost
Portkey (Enterprise)	Custom (likely $500-2,000/mo)
LiteLLM Enterprise	~$250/mo + server costs
OpenRouter	~$1,100/mo (5.5% of $20K)
Helicone (as complement)	$100-200/mo

The Decision Tree

Stop overthinking this. Here’s how to pick:

Step 1 — What’s your budget?

$0/month → LiteLLM open-source (or Helicone free tier for monitoring only)
Under $50/month → LiteLLM self-hosted + Helicone free tier
$50-500/month → Portkey Production or LiteLLM + Helicone Pro
$500+/month → Portkey Enterprise

Step 2 — What’s your primary pain point?

“I need models to never go down” → Portkey (routing + fallback)
“I need to cut costs” → Portkey (semantic caching) or LiteLLM + Helicone (visibility + exact caching)
“I need to try lots of models quickly” → OpenRouter
“I need to understand where money goes” → Helicone
“I need to run local models alongside cloud APIs” → LiteLLM

Step 3 — What’s your team’s ops capacity?

“We have DevOps engineers” → LiteLLM self-hosted is fine
“We’d rather not manage infrastructure” → Portkey SaaS or OpenRouter
“We want something in between” → Helicone SaaS + LiteLLM on a managed host

Recommended Combinations

Here’s what actually works in practice:

Solo dev / early MVP: OpenRouter (test models fast) → migrate to LiteLLM once you’ve settled on 1-2 providers.

Startup, cost-sensitive: LiteLLM (self-hosted) + Helicone (Pro at $20/seat). You get routing, fallback, and excellent cost visibility for under $100/month total.

Growing product team: Portkey Production ($49/mo). One tool, no integration headaches, semantic caching starts saving money immediately.

Enterprise with compliance needs: Portkey Enterprise. SSO, RBAC, SOC 2, VPC deployment. Nothing else covers all boxes without DIY assembly.

FAQ

Can I use multiple tools together?

Yes, and it’s common. LiteLLM + Helicone is the most popular combo I see. LiteLLM handles routing; Helicone handles monitoring. Don’t stack more than two tools though — complexity has a cost.

How hard is it to migrate between gateways?

If you’re using the OpenAI SDK format (which all four support), migration is mostly changing base_url and your API key. The catch: if you rely heavily on tool-specific features (Portkey’s semantic caching, LiteLLM’s custom routing configs), those don’t transfer. Build an abstraction layer early if you think you’ll switch.

Self-hosted or SaaS?

SaaS unless you have a specific reason not to. The hidden costs of self-hosting — server provisioning, monitoring, security patches, backup — almost always exceed the SaaS fee. Exception: strict data residency requirements or regulated industries where data can’t leave your VPC.

Do these tools add meaningful latency?

Barely. Portkey adds 20-50ms, LiteLLM proxy adds 5-15ms, Helicone (header-based) adds under 5ms. For most applications where LLM responses take 1-10 seconds, this is noise. OpenRouter can add 30-80ms depending on routing to the underlying provider.

What about Cloudflare AI Gateway?

Worth considering if you’re already deep in the Cloudflare ecosystem. It’s simple, adds caching and rate limiting, and integrates with Workers. But it lacks the routing sophistication of Portkey and the model breadth of LiteLLM. Think of it as a lightweight option for Cloudflare-native teams, not a full replacement for any tool in this comparison.

The Verdict

Here’s the thing: there’s no universally “best” LLM gateway. But there is a best one for your situation right now.

Portkey wins for production teams who want one tool that does everything. The semantic caching alone can pay for the subscription. Since being acquired by Palo Alto Networks, expect even stronger enterprise security features going forward.

LiteLLM wins for developers who value control, flexibility, and cost above all else. It’s genuinely free, genuinely open, and genuinely capable. Just budget for the ops time.

OpenRouter wins for speed-to-first-call and model exploration. It’s not infrastructure — it’s a convenience layer. Use it to explore, then graduate to something more capable.

Helicone wins as a complement, not a standalone solution. Pair it with LiteLLM or your own routing code and you get enterprise-grade observability without enterprise pricing.

Bottom line: Start with the simplest tool that solves your current problem. Don’t pay for features you’ll “probably need someday.” LLM gateway migration isn’t painful — wasted months on the wrong tool is.

Stay updated with our latest AI insights