Claude API pricing in 2026: a complete breakdown (Opus 4.8, Sonnet 4.6, Haiku 4.5)

If you're building anything serious with Claude — Claude Code sessions, coding agents, RAG pipelines, evals — token cost dominates your variable spend. This is the complete 2026 Claude API price list, plus three legitimate ways to cut it in half without changing a line of SDK code.

The TL;DR: list price has held mostly steady year over year, prompt caching delivers a 90% discount on repeated context, batch API is half-price for async work, and using a discounted compatible gateway stacks on top.

The 2026 list-price table

All prices below are per million tokens, as billed by api.anthropic.com directly. Verified against the public Anthropic pricing table on 2026-06.

Model	Context	Input	Output	5m cache write	1h cache write	Cache hits & refreshes
Claude Opus 4.8	200K	$5.00	$25.00	$6.25	$10.00	$0.50
Claude Opus 4.7	200K	$5.00	$25.00	$6.25	$10.00	$0.50
Claude Opus 4.6	200K	$5.00	$25.00	$6.25	$10.00	$0.50
Claude Opus 4.5	200K	$5.00	$25.00	$6.25	$10.00	$0.50
Claude Opus 4.1 (deprecated)	200K	$15.00	$75.00	$18.75	$30.00	$1.50
Claude Opus 4 (deprecated)	200K	$15.00	$75.00	$18.75	$30.00	$1.50
Claude Sonnet 4.6	200K	$3.00	$15.00	$3.75	$6.00	$0.30
Claude Sonnet 4.6 (1M tier)	1M	$6.00	$22.50	$7.50	$12.00	$0.60
Claude Sonnet 4.5	200K	$3.00	$15.00	$3.75	$6.00	$0.30
Claude Sonnet 4 (deprecated)	200K	$3.00	$15.00	$3.75	$6.00	$0.30
Claude Haiku 4.5	200K	$1.00	$5.00	$1.25	$2.00	$0.10
Claude Haiku 3.5 (retired, Bedrock/Vertex only)	200K	$0.80	$4.00	$1.00	$1.60	$0.08

Anthropic dropped the Opus tier from $15/$75 to $5/$25 with the Opus 4.5 release in late 2025, and that rate carries through 4.6 → 4.7 → 4.8 unchanged. Across every current model, the cache math is constant: 5m cache write = 1.25× input, 1h cache write = 2× input, cache hits / refreshes = 0.10× input. Pick the 1h TTL when the prompt is reused across a longer session (audits, RAG sessions); the 5m TTL pays off faster for tight agent loops.

What that actually costs in practice

List price is easy to look up. The number you actually care about is dollars per agent turn in your real workload. Some grounded examples from production Claude Code traffic flowing through Anvat in May 2026:

Agent turn type	Avg tokens (in / out)	Cost (list)	Cost (cached, 80% hit)
Cursor inline edit (Sonnet 4.6)	4K / 200	$0.015	$0.005
Claude Code small task (Sonnet 4.6)	25K / 1.5K	$0.098	$0.030
Claude Code refactor (Opus 4.8)	60K / 4K	$0.40	$0.11
Long-form RAG answer (Sonnet 4.6)	80K / 2K	$0.27	$0.080
Codebase Q&A with caching (Opus 4.8)	150K / 1.5K	$0.79	$0.19

The big takeaway: cache hits matter more than picking a cheaper model once your prompts contain a stable system prompt + tool definitions. Claude Code in particular caches aggressively — typical hit rates we observe across production tenants land between 60% and 85%.

Three legitimate ways to cut the bill

1. Prompt caching (free, 90% off repeated context)

If your prompt has a stable prefix — system instructions, tool definitions, the repository skeleton you're feeding into an agent — wrap it with cache_control: { type: "ephemeral" }. The first call pays 1.25× input price to write the cache; every subsequent call within 5 minutes pays 10% of input price for that segment.

const response = await anthropic.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: largeSystemPrompt,
      cache_control: { type: "ephemeral" },
    },
  ],
  messages: [{ role: "user", content: question }],
});

In a long-running Claude Code session this alone cuts effective input cost by 60–80%. Don't skip it.

2. Batch API (50% off, ≤ 24h latency)

For anything async — overnight evals, document classification, embedding backfills — submit via /v1/messages/batches and pay half list price for both input and output. Anthropic guarantees < 24h turnaround; in practice batches return in 1–4 hours.

curl https://api.anthropic.com/v1/messages/batches \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d @batch.jsonl

If you're running offline analytics on >10K prompts/day, this single change roughly halves your monthly Anthropic invoice.

3. Use a discounted, compatible gateway

This is what Anvat exists for. The wire protocol is identical to api.anthropic.com — set ANTHROPIC_BASE_URL=https://api.anvat.app/v1 in your environment and the same Claude Code / SDK / curl request just works. The gateway adds:

30% off every metered token rate vs Anthropic list.
2× credit match on prepaid packages (deposit $50, get $100 of API credit).
Streaming, tool use, prompt caching, vision inputs — all passthrough, no quality loss.

Combined with prompt caching, real production sessions land at roughly 40–50% of what you'd pay Anthropic directly for the same workload.

Anvat is OpenAI-compatible too — the same key works against /v1/chat/completions for GPT, Gemini, Llama 3.1/3.3 traffic. One key, one bill, every frontier model.

How to verify pricing claims yourself

Two reliable sources, in order:

Anthropic's official pricing page — updated within hours of any change. Bookmark it.
Your own usage field in every API response. Every Claude response includes exact input_tokens, output_tokens, cache_creation_input_tokens, cache_read_input_tokens. Multiply by the table above and you get the true cost — no estimates, no rounding.

If you're going through Anvat, the same numbers surface in /app/usage and /app/logs with the discounted rate already applied.

When to pick which Claude model

A quick decision tree from a year of running every model in production:

Pure quality / hardest tasks → Opus 4.8. Worth ~1.7× the Sonnet price (the Opus tier dropped from $15/$75 to $5/$25 starting with the 4.5 release, so the price-quality gap is much smaller than it used to be) — pick it for security review, architecture critique, complex refactors.
Default workhorse → Sonnet 4.6. Best price-performance for >90% of agent traffic. Use the 1M-context tier only when you really need it — the 200K tier is half the price and rarely the bottleneck.
High-volume, cost-sensitive → Haiku 4.5. Classification, summarisation, tool-calling glue, autocomplete. Surprisingly capable.
Legacy Sonnet 3.7 → upgrade. No reason to stay on 3.7 in 2026 — 4.6 is the same price and meaningfully better on every benchmark.

Bottom line

For most teams the bill-cutting checklist is:

Add cache_control to every stable system prompt (one afternoon's work).
Send async workloads through /v1/messages/batches (half-price, no accuracy hit).
Route through a discounted gateway like Anvat for another 30% on top.

Stack all three and a $5,000/mo Anthropic invoice typically lands around $1,400–$2,000 — for the same workload, same model quality, same SDK code.

Try the discounted Claude gateway

Set ANTHROPIC_BASE_URL and your existing Claude Code / SDK code keeps working — 30% off metered tokens + $2 free credit on signup.

Start free → →