Anvat / Leaderboard · Week of 2026-06-06

AI cost-efficiency leaderboard.

Name: Anvat AI Cost-Efficiency Leaderboard
Creator: Anvat
Published: 2026-06-06
License: https://creativecommons.org/licenses/by/4.0/
Keywords: llm benchmark, ai pricing, claude vs gpt, model leaderboard, cost per swe-bench, cost per mcp atlas, ai cost efficiency

Frontier LLMs ranked by intelligence-per-dollar. Each table divides a benchmark score by the Anvat effective blended token price ($/MTok, 1:3 input:output weighting). Higher value = more answer quality per dollar spent.

Updated every week from publisher pricing + system-card benchmark numbers. Source data: /benchmarks · /pricing.

What's on the site right now

OpenAI-compatible
Anthropic-compatible
MCP-ready
One API key
30% off list
2× credit on prepaid
No card to start
5 languages

How to read this: The "Value" column is score ÷ Anvat $/MTok (blended). Numbers are comparable WITHIN a category, not across categories — SWE-Bench % and HLE % aren't the same scale. All prices already include Anvat's 30% off list rates.

Coding

Best $ per point — SWE-Bench Verified

See raw scores →

Rank	Model	Score	List $/MTok	Anvat $/MTok	Value (score/$)
#1	DeepSeek V4 FlashDeepSeek	62.4	$0.473	$0.331	189
#2	Gemini 3.5 FlashGoogle	55.1	$0.487	$0.341	161
#3	GPT-5 MiniOpenAI	52.1	$1.56	$1.09	47.6
#4	DeepSeek V4 ProDeepSeek	80.6	$3.04	$2.13	37.8
#5	GPT-5.5OpenAI	67.4	$7.81	$5.47	12.3
#6	Gemini 3.1 ProGoogle	54.2	$7.81	$5.47	9.91
#7	Claude Sonnet 4.6Anthropic	70.3	$12.0	$8.40	8.37
#8	Claude Opus 4.8Anthropic	81.5	$60.0	$42.0	1.94
#9	Claude Opus 4.7Anthropic	80.8	$60.0	$42.0	1.92

Agentic + tool use

Best $ per point — MCP Atlas

See raw scores →

Rank	Model	Score	List $/MTok	Anvat $/MTok	Value (score/$)
#1	Gemini 3.5 FlashGoogle	83.6	$0.487	$0.341	245
#2	DeepSeek V4 FlashDeepSeek	56.6	$0.473	$0.331	171
#3	GPT-5 MiniOpenAI	64.2	$1.56	$1.09	58.7
#4	DeepSeek V4 ProDeepSeek	73.6	$3.04	$2.13	34.5
#5	Gemini 3.1 ProGoogle	78.2	$7.81	$5.47	14.3
#6	GPT-5.5OpenAI	75.3	$7.81	$5.47	13.8
#7	Claude Sonnet 4.6Anthropic	69.5	$12.0	$8.40	8.27
#8	Claude Opus 4.8Anthropic	79.4	$60.0	$42.0	1.89
#9	Claude Opus 4.7Anthropic	79.1	$60.0	$42.0	1.88

Reasoning

Best $ per point — Humanity's Last Exam

See raw scores →

Rank	Model	Score	List $/MTok	Anvat $/MTok	Value (score/$)
#1	Gemini 3.5 FlashGoogle	40.2	$0.487	$0.341	118
#2	GPT-5 MiniOpenAI	27.4	$1.56	$1.09	25.1
#3	DeepSeek V4 ProDeepSeek	37.7	$3.04	$2.13	17.7
#4	Gemini 3.1 ProGoogle	44.4	$7.81	$5.47	8.12
#5	GPT-5.5OpenAI	41.4	$7.81	$5.47	7.57
#6	Claude Sonnet 4.6Anthropic	33.2	$12.0	$8.40	3.95
#7	Claude Opus 4.8Anthropic	47.2	$60.0	$42.0	1.12
#8	Claude Opus 4.7Anthropic	46.9	$60.0	$42.0	1.12

Math

Best $ per point — HMMT 2026 Feb (math)

See raw scores →

Rank	Model	Score	List $/MTok	Anvat $/MTok	Value (score/$)
#1	DeepSeek V4 ProDeepSeek	95.2	$3.04	$2.13	44.7
#2	Claude Opus 4.8Anthropic	96.5	$60.0	$42.0	2.30
#3	Claude Opus 4.7Anthropic	96.2	$60.0	$42.0	2.29

Long context

Best $ per point — MRCR v2 (128K, 8-needle)

See raw scores →

Rank	Model	Score	List $/MTok	Anvat $/MTok	Value (score/$)
#1	Gemini 3.5 FlashGoogle	77.3	$0.487	$0.341	227
#2	GPT-5 MiniOpenAI	88.4	$1.56	$1.09	80.8
#3	GPT-5.5OpenAI	94.8	$7.81	$5.47	17.3
#4	Gemini 3.1 ProGoogle	84.9	$7.81	$5.47	15.5
#5	Claude Sonnet 4.6Anthropic	84.9	$12.0	$8.40	10.1
#6	Claude Opus 4.8Anthropic	60.1	$60.0	$42.0	1.43
#7	Claude Opus 4.7Anthropic	59.3	$60.0	$42.0	1.41

Multimodal

Best $ per point — CharXiv Reasoning

See raw scores →

Rank	Model	Score	List $/MTok	Anvat $/MTok	Value (score/$)
#1	Gemini 3.5 FlashGoogle	84.2	$0.487	$0.341	247
#2	GPT-5 MiniOpenAI	76.2	$1.56	$1.09	69.7
#3	GPT-5.5OpenAI	84.1	$7.81	$5.47	15.4
#4	Gemini 3.1 ProGoogle	83.3	$7.81	$5.47	15.2
#5	Claude Sonnet 4.6Anthropic	72.4	$12.0	$8.40	8.62
#6	Claude Opus 4.8Anthropic	82.3	$60.0	$42.0	1.96
#7	Claude Opus 4.7Anthropic	82.1	$60.0	$42.0	1.95

Pure price ranking

Cheapest models — ignoring quality

Don't pick a model from this table alone — pair it with the benchmark you actually care about above. Useful for budget-bounded workloads (classification, extraction, background processing).

Rank	Model	List $/MTok	Anvat $/MTok	Δ vs Opus 4.8
#1	DeepSeek V4 FlashDeepSeek	$0.473	$0.331	127× cheaper
#2	Gemini 3.5 FlashGoogle	$0.487	$0.341	123× cheaper
#3	GPT-5 MiniOpenAI	$1.56	$1.09	38× cheaper
#4	DeepSeek V4 ProDeepSeek	$3.04	$2.13	20× cheaper
#5	GPT-5.5OpenAI	$7.81	$5.47	8× cheaper
#6	Gemini 3.1 ProGoogle	$7.81	$5.47	8× cheaper
#7	Claude Sonnet 4.6Anthropic	$12.0	$8.40	5× cheaper
#8	Claude Opus 4.8Anthropic	$60.0	$42.0	—
#9	Claude Opus 4.7Anthropic	$60.0	$42.0	—

Why this ranking moves every week

Frontier model pricing is genuinely volatile. New tiers ship (DeepSeek V4 Flash, Gemini 3.5 Flash, GPT-5 Mini), benchmark scores get re-run on cleaner harnesses, and providers cut prices when competition forces them to. The industry feed on the right tracks the underlying provider releases that drive this ranking.

The news digest covers the underlying provider releases that move this leaderboard.

Methodology

• Blended price = (input + 3 × output) / 4, USD per million tokens. The 1:3 ratio mirrors a typical chat / coding workload.
• Anvat effective price = list × 0.70 (30% off). Combined with 2× credit match on prepaid packs, realised cost is roughly half of provider-direct.
• Value = benchmark score / Anvat blended $/MTok. A model that scores 80 at $1/MTok beats one that scores 90 at $10/MTok — usually.
• Refresh cadence: weekly, or sooner when a provider ships a price change or a new model.

Caveats — read these

• Benchmark scores are single-pass, best-effort, from publisher system cards. Production reliability varies.
• Cheap-and-fast isn't the same as cheap-and-good. A 90% model that takes 1 retry effectively costs 2× its sticker.
• Long-context and multimodal pricing is often quoted with different premiums — the blended ratio is conservative.
• Prepaid-credit advantage isn't visible here. Real $/dollar drops further if you use Anvat's 2× match.

Run any of these models from one key.

Switch models with a single param change. No vendor lock-in, 30% off published list, 2× credit on prepaid packs.

Start free with $2 credit →Run the cost calculator