pricingclaudeanthropic

Claude API pricing in 2026: a complete breakdown (Opus 4.8, Sonnet 4.6, Haiku 4.5)

The full Anthropic Claude API price list for 2026 — every model, input/output rates, prompt caching discounts, batch API savings, and how to cut the bill by ~50% with a discounted gateway.

Anvat team6 min read

If you're building anything serious with Claude — Claude Code sessions, coding agents, RAG pipelines, evals — token cost dominates your variable spend. This is the complete 2026 Claude API price list, plus three legitimate ways to cut it in half without changing a line of SDK code.

The TL;DR: list price has held mostly steady year over year, prompt caching delivers a 90% discount on repeated context, batch API is half-price for async work, and using a discounted compatible gateway stacks on top.

The 2026 list-price table

All prices below are per million tokens, as billed by api.anthropic.com directly.

ModelContextInputOutputCache write (5min)Cache read
Claude Opus 4.8200K$15.00$75.00$18.75$1.50
Claude Sonnet 4.6200K$3.00$15.00$3.75$0.30
Claude Sonnet 4.6 (>200K window)1M$6.00$22.50$7.50$0.60
Claude Haiku 4.5200K$1.00$5.00$1.25$0.10
Claude Opus 4.5 (legacy)200K$15.00$75.00$18.75$1.50
Claude Sonnet 3.7 (legacy)200K$3.00$15.00$3.75$0.30

Cache write is 1.25× input cost; cache read is 0.10× input cost. Once a chunk is cached, you pay 10% of the input price every time you reuse it within the 5-minute TTL. A 1-hour cache beta exists at 2× input write cost.

What that actually costs in practice

List price is easy to look up. The number you actually care about is dollars per agent turn in your real workload. Some grounded examples from production Claude Code traffic flowing through Anvat in May 2026:

Agent turn typeAvg tokens (in / out)Cost (list)Cost (cached, 80% hit)
Cursor inline edit (Sonnet 4.6)4K / 200$0.015$0.005
Claude Code small task (Sonnet 4.6)25K / 1.5K$0.098$0.030
Claude Code refactor (Opus 4.8)60K / 4K$1.20$0.34
Long-form RAG answer (Sonnet 4.6)80K / 2K$0.27$0.080
Codebase Q&A with caching (Opus 4.8)150K / 1.5K$2.36$0.57

The big takeaway: cache hits matter more than picking a cheaper model once your prompts contain a stable system prompt + tool definitions. Claude Code in particular caches aggressively — typical hit rates we observe across production tenants land between 60% and 85%.

Three legitimate ways to cut the bill

1. Prompt caching (free, 90% off repeated context)

If your prompt has a stable prefix — system instructions, tool definitions, the repository skeleton you're feeding into an agent — wrap it with cache_control: { type: "ephemeral" }. The first call pays 1.25× input price to write the cache; every subsequent call within 5 minutes pays 10% of input price for that segment.

const response = await anthropic.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: largeSystemPrompt,
      cache_control: { type: "ephemeral" },
    },
  ],
  messages: [{ role: "user", content: question }],
});

In a long-running Claude Code session this alone cuts effective input cost by 60–80%. Don't skip it.

2. Batch API (50% off, ≤ 24h latency)

For anything async — overnight evals, document classification, embedding backfills — submit via /v1/messages/batches and pay half list price for both input and output. Anthropic guarantees < 24h turnaround; in practice batches return in 1–4 hours.

curl https://api.anthropic.com/v1/messages/batches \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d @batch.jsonl

If you're running offline analytics on >10K prompts/day, this single change roughly halves your monthly Anthropic invoice.

3. Use a discounted, compatible gateway

This is what Anvat exists for. The wire protocol is identical to api.anthropic.com — set ANTHROPIC_BASE_URL=https://api.anvat.app/v1 in your environment and the same Claude Code / SDK / curl request just works. The gateway adds:

  • 30% off every metered token rate vs Anthropic list.
  • 2× credit match on prepaid packages (deposit $50, get $100 of API credit).
  • Streaming, tool use, prompt caching, vision inputs — all passthrough, no quality loss.

Combined with prompt caching, real production sessions land at roughly 40–50% of what you'd pay Anthropic directly for the same workload.

Anvat is OpenAI-compatible too — the same key works against /v1/chat/completions for GPT, Gemini, Llama 3.1/3.3 traffic. One key, one bill, every frontier model.

How to verify pricing claims yourself

Two reliable sources, in order:

  1. Anthropic's official pricing page — updated within hours of any change. Bookmark it.
  2. Your own usage field in every API response. Every Claude response includes exact input_tokens, output_tokens, cache_creation_input_tokens, cache_read_input_tokens. Multiply by the table above and you get the true cost — no estimates, no rounding.

If you're going through Anvat, the same numbers surface in /app/usage and /app/logs with the discounted rate already applied.

When to pick which Claude model

A quick decision tree from a year of running every model in production:

  • Pure quality / hardest tasks → Opus 4.8. Worth 5× the Sonnet price when the answer matters (security review, architecture critique, complex refactors).
  • Default workhorse → Sonnet 4.6. Best price-performance for >90% of agent traffic. Use the 1M-context tier only when you really need it — the 200K tier is half the price and rarely the bottleneck.
  • High-volume, cost-sensitive → Haiku 4.5. Classification, summarisation, tool-calling glue, autocomplete. Surprisingly capable.
  • Legacy Sonnet 3.7 → upgrade. No reason to stay on 3.7 in 2026 — 4.6 is the same price and meaningfully better on every benchmark.

Bottom line

For most teams the bill-cutting checklist is:

  1. Add cache_control to every stable system prompt (one afternoon's work).
  2. Send async workloads through /v1/messages/batches (half-price, no accuracy hit).
  3. Route through a discounted gateway like Anvat for another 30% on top.

Stack all three and a $5,000/mo Anthropic invoice typically lands around $1,400–$2,000 — for the same workload, same model quality, same SDK code.

Try the discounted Claude gateway

Set ANTHROPIC_BASE_URL and your existing Claude Code / SDK code keeps working — 30% off metered tokens + $2 free credit on signup.

Start free →