If you're building anything serious with Claude — Claude Code sessions, coding agents, RAG pipelines, evals — token cost dominates your variable spend. This is the complete 2026 Claude API price list, plus three legitimate ways to cut it in half without changing a line of SDK code.
The TL;DR: list price has held mostly steady year over year, prompt caching delivers a 90% discount on repeated context, batch API is half-price for async work, and using a discounted compatible gateway stacks on top.
The 2026 list-price table
All prices below are per million tokens, as billed by api.anthropic.com
directly.
| Model | Context | Input | Output | Cache write (5min) | Cache read |
|---|---|---|---|---|---|
| Claude Opus 4.8 | 200K | $15.00 | $75.00 | $18.75 | $1.50 |
| Claude Sonnet 4.6 | 200K | $3.00 | $15.00 | $3.75 | $0.30 |
| Claude Sonnet 4.6 (>200K window) | 1M | $6.00 | $22.50 | $7.50 | $0.60 |
| Claude Haiku 4.5 | 200K | $1.00 | $5.00 | $1.25 | $0.10 |
| Claude Opus 4.5 (legacy) | 200K | $15.00 | $75.00 | $18.75 | $1.50 |
| Claude Sonnet 3.7 (legacy) | 200K | $3.00 | $15.00 | $3.75 | $0.30 |
Cache write is 1.25× input cost; cache read is 0.10× input cost. Once a chunk is cached, you pay 10% of the input price every time you reuse it within the 5-minute TTL. A 1-hour cache beta exists at 2× input write cost.
What that actually costs in practice
List price is easy to look up. The number you actually care about is dollars per agent turn in your real workload. Some grounded examples from production Claude Code traffic flowing through Anvat in May 2026:
| Agent turn type | Avg tokens (in / out) | Cost (list) | Cost (cached, 80% hit) |
|---|---|---|---|
| Cursor inline edit (Sonnet 4.6) | 4K / 200 | $0.015 | $0.005 |
| Claude Code small task (Sonnet 4.6) | 25K / 1.5K | $0.098 | $0.030 |
| Claude Code refactor (Opus 4.8) | 60K / 4K | $1.20 | $0.34 |
| Long-form RAG answer (Sonnet 4.6) | 80K / 2K | $0.27 | $0.080 |
| Codebase Q&A with caching (Opus 4.8) | 150K / 1.5K | $2.36 | $0.57 |
The big takeaway: cache hits matter more than picking a cheaper model once your prompts contain a stable system prompt + tool definitions. Claude Code in particular caches aggressively — typical hit rates we observe across production tenants land between 60% and 85%.
Three legitimate ways to cut the bill
1. Prompt caching (free, 90% off repeated context)
If your prompt has a stable prefix — system instructions, tool definitions, the
repository skeleton you're feeding into an agent — wrap it with
cache_control: { type: "ephemeral" }. The first call pays 1.25× input price
to write the cache; every subsequent call within 5 minutes pays 10% of input
price for that segment.
const response = await anthropic.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 1024,
system: [
{
type: "text",
text: largeSystemPrompt,
cache_control: { type: "ephemeral" },
},
],
messages: [{ role: "user", content: question }],
});In a long-running Claude Code session this alone cuts effective input cost by 60–80%. Don't skip it.
2. Batch API (50% off, ≤ 24h latency)
For anything async — overnight evals, document classification, embedding
backfills — submit via /v1/messages/batches and pay half list price for
both input and output. Anthropic guarantees < 24h turnaround; in practice
batches return in 1–4 hours.
curl https://api.anthropic.com/v1/messages/batches \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-d @batch.jsonlIf you're running offline analytics on >10K prompts/day, this single change roughly halves your monthly Anthropic invoice.
3. Use a discounted, compatible gateway
This is what Anvat exists for. The wire protocol is identical to
api.anthropic.com — set ANTHROPIC_BASE_URL=https://api.anvat.app/v1 in
your environment and the same Claude Code / SDK / curl request just works.
The gateway adds:
- 30% off every metered token rate vs Anthropic list.
- 2× credit match on prepaid packages (deposit $50, get $100 of API credit).
- Streaming, tool use, prompt caching, vision inputs — all passthrough, no quality loss.
Combined with prompt caching, real production sessions land at roughly 40–50% of what you'd pay Anthropic directly for the same workload.
Anvat is OpenAI-compatible too — the same key works against
/v1/chat/completions for GPT, Gemini, Llama 3.1/3.3 traffic. One key,
one bill, every frontier model.
How to verify pricing claims yourself
Two reliable sources, in order:
- Anthropic's official pricing page — updated within hours of any change. Bookmark it.
- Your own
usagefield in every API response. Every Claude response includes exactinput_tokens,output_tokens,cache_creation_input_tokens,cache_read_input_tokens. Multiply by the table above and you get the true cost — no estimates, no rounding.
If you're going through Anvat, the same numbers surface in /app/usage and
/app/logs with the discounted rate already applied.
When to pick which Claude model
A quick decision tree from a year of running every model in production:
- Pure quality / hardest tasks → Opus 4.8. Worth 5× the Sonnet price when the answer matters (security review, architecture critique, complex refactors).
- Default workhorse → Sonnet 4.6. Best price-performance for >90% of agent traffic. Use the 1M-context tier only when you really need it — the 200K tier is half the price and rarely the bottleneck.
- High-volume, cost-sensitive → Haiku 4.5. Classification, summarisation, tool-calling glue, autocomplete. Surprisingly capable.
- Legacy Sonnet 3.7 → upgrade. No reason to stay on 3.7 in 2026 — 4.6 is the same price and meaningfully better on every benchmark.
Bottom line
For most teams the bill-cutting checklist is:
- Add
cache_controlto every stable system prompt (one afternoon's work). - Send async workloads through
/v1/messages/batches(half-price, no accuracy hit). - Route through a discounted gateway like Anvat for another 30% on top.
Stack all three and a $5,000/mo Anthropic invoice typically lands around $1,400–$2,000 — for the same workload, same model quality, same SDK code.
Try the discounted Claude gateway
Set ANTHROPIC_BASE_URL and your existing Claude Code / SDK code keeps working — 30% off metered tokens + $2 free credit on signup.
Start free → →