If you've searched for "cheap Claude API" you've probably landed on a dozen sites promising 70–85% off official rates. Some of those are legitimate optimisation tools. Some are reselling stolen Anthropic accounts. A lot are just LiteLLM wrappers with a markup that disappears once you actually audit the math.
This post separates the four legitimate paths from the noise, with real cost numbers and the operational trade-offs each one imposes.
The list-price baseline
Direct Anthropic pricing as of mid-2026, per million tokens:
| Model | Input | Output | Cache read |
|---|---|---|---|
| Claude Opus 4.8 | $15.00 | $75.00 | $1.50 |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $0.30 |
| Claude Haiku 4.5 | $1.00 | $5.00 | $0.10 |
Every "cheap Claude" claim should be measurable against these numbers. If a provider quotes a number, ask them to show you their input/output rate per million tokens for the model you'll actually use. If they can't — walk.
The four legitimate paths
1. Prompt caching (free, 90% off repeated context)
If your prompts have a stable prefix — system instructions, tool definitions,
repository skeleton fed into an agent — wrap the stable parts in
cache_control blocks. The first call pays 1.25× input price to warm the
cache; every subsequent call within 5 minutes pays 10% of input price for
those tokens.
const response = await anthropic.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
system: [
{
type: "text",
text: largeSystemPrompt,
cache_control: { type: "ephemeral" },
},
],
messages: [{ role: "user", content: question }],
});In production agent workloads (Claude Code, Cursor, custom agents) cache hit rates land between 60% and 85%. That alone cuts effective input cost by 60-80% without touching list pricing.
Effort: one afternoon. Risk: zero. Savings: 60-80% of input cost.
2. Batch API (50% off, ≤ 24h latency)
For any async work — overnight evals, document classification, embedding
backfills, log analysis, content moderation — submit batches via
/v1/messages/batches and pay half list price on both input and
output. Anthropic guarantees < 24h; in practice batches return in 1-4
hours.
curl https://api.anthropic.com/v1/messages/batches \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d @batch.jsonlIf you're running offline jobs on > 10K prompts per day, this single change roughly halves your monthly Anthropic invoice.
Effort: a day of integration work. Risk: low (Anthropic-supported). Savings: 50% of batch traffic.
3. Model routing (40-60% off via smart downgrades)
About 60-70% of requests sent to Opus could be handled by Sonnet — and 30-50% of Sonnet requests could be handled by Haiku — at identical user-facing quality. If you can classify request complexity cheaply, route accordingly.
A simple heuristic that holds up in production:
| Task type | Default model | Downgrade if… |
|---|---|---|
| Intent classification | Haiku 4.5 | n/a — already cheapest |
| Data extraction | Haiku 4.5 | Use Sonnet only if schema is novel |
| Code completion | Sonnet 4.6 | Use Haiku for short snippets |
| Refactor planning | Sonnet 4.6 | Escalate to Opus on multi-module impact |
| Complex review | Opus 4.8 | Stay on Opus — quality matters |
A router that splits traffic 20% Opus / 60% Sonnet / 20% Haiku typically hits 40-60% of pure-Opus cost for the same task mix.
Effort: real engineering investment (router + evals). Risk: medium (quality regressions if your classifier is bad). Savings: 40-60%.
4. Discounted gateway (30% off list, stacks with everything)
A legitimate discounted gateway acquires bulk capacity at provider-direct rates and resells at a margin smaller than its discount — making money on volume + spread between batch / cache / passthrough mixes.
What "legitimate" means:
- The pricing math is transparent — they show you input/output rates per million tokens, in dollars, per model.
- They pay Anthropic / OpenAI / Google directly — verify by asking how they handle data retention and BAA / DPA.
- No subscription required to test — pay-as-you-go from $1 or free trial credit.
- Identical wire format to direct provider APIs — your existing SDK code keeps working.
- Public status page, documented uptime, real support address (not a Telegram bot DM).
Anvat fits this pattern: 30% off Anthropic and OpenAI metered rates, 2× credit match on prepaid packs (deposit $50, get $100 of API credit), OpenAI- and Anthropic-compatible at the same time so one key works in Claude Code, Cursor, OpenAI SDK, LangChain, anywhere.
Combined with prompt caching, real production sessions land at roughly 40-50% of provider-direct cost for the same workload.
Effort: changing one env var. Risk: low. Savings: 30% on metered rates, stacks with caching and batch.
The stack: how much can you actually save?
Plug all four legitimate strategies in for a typical Claude Code-heavy team:
| Optimisation | Multiplier on baseline |
|---|---|
| Prompt caching (70% hit rate) | × 0.40 |
| Batch API on async 30% of traffic | × 0.85 (50% off × 30% of jobs) |
| Model routing (Sonnet-default vs Opus-default) | × 0.50 |
| Discounted gateway (-30%) | × 0.70 |
Stacked: 0.40 × 0.85 × 0.50 × 0.70 = 0.119 → ~88% off list.
In practice teams don't apply all four perfectly — realistic full-stack optimisation lands at 70-80% off baseline Anthropic-direct pricing for the same task mix and quality. A $5,000/mo Anthropic invoice becomes a $1,000-$1,500/mo Anvat invoice without changing what your agents do.
Three "cheap Claude" paths to avoid
1. Telegram / Discord "70% off Claude" sellers
Almost always selling access to compromised Anthropic accounts (carded billing, stolen API keys, organisation accounts with default seats). When the account gets banned — and it will — your service breaks mid-traffic and your data is in the seller's logs forever.
The tell: payment in crypto-only, no company website, "no questions asked" support. Walk.
2. "Free Claude Pro" desktop clients
Several open-source CLIs market themselves as a "free Claude Pro alternative" by signing you in with a stolen Anthropic web session cookie. Same risk profile as the Telegram sellers, packaged with worse opsec.
3. Resellers with no verifiable downstream
Some "gateways" don't actually have an Anthropic contract — they're reselling another reseller's reseller. When the chain breaks (and it breaks), you're 4 hops away from anyone who can help. Ask any prospective gateway: "What's your direct contract or partnership with Anthropic, in one sentence?" If they dodge, that's your answer.
Sanity check: what should I actually pay?
A working back-of-the-envelope for "reasonable Claude pricing" in 2026:
- Provider-direct (Anthropic): $3 input / $15 output per MTok Sonnet, $15 / $75 Opus. Baseline reality.
- Reasonable discount gateway: -20% to -35% of provider-direct. Below -40% without a clear business model = suspicious.
- Cached repeated context: ~10% of input price after first call.
- Batched async work: 50% of input + output.
- Suspiciously cheap (-70%+): probably either reselling stolen accounts or running an unsustainable promotional period that will evaporate when funding does.
Bottom line
You can legitimately cut your Claude bill by 60-80% in 2026 without losing quality, without touching dodgy resellers, and without rewriting your agents. The order to do it in:
- Add prompt caching today — biggest single-digit-day win.
- Move async work to the Batch API — half-off, no quality hit.
- Switch to a discounted gateway — 30% off on top.
- Build a model router — biggest long-term win, biggest investment.
Full Anthropic price breakdown → Setting up Claude Code with a custom base URL →
Get 30% off Claude — legitimately
Anvat is Anthropic-compatible, pays providers direct, and shows you the math per request. $2 free credit on signup, no card required.
Start free → →