DeepSeek V4 launched April 24, 2026 — the fourth major release in the series that started commoditising frontier-tier AI when V3 shipped in December 2024. V4 isn't a quality leap; the benchmark numbers are at parity with frontier closed models on coding, behind on the hardest reasoning. The story is the economics + the agentic positioning. Here's the honest assessment.
The release at a glance
Two MoE checkpoints shipped, both at 1M-token context, both MIT-licensed:
| Model | Total params | Active params | Context | License |
|---|---|---|---|---|
| DeepSeek V4 Pro | 1.6T | 49B | 1M | MIT |
| DeepSeek V4 Flash | 284B | 13B | 1M | MIT |
Plus base versions (V4-Pro-Base, V4-Flash-Base) for fine-tuning, and "-Max" inference modes that enable extended reasoning tokens for higher benchmark scores.
The benchmark table that matters
V4-Pro-Max vs the frontier closed models, per the DeepSeek paper:
| Benchmark | V4-Pro-Max | Opus 4.6 | GPT-5.4 xHigh | Gemini 3.1 Pro |
|---|---|---|---|---|
| SWE-Bench Verified | 80.6% | 80.8% | — | 80.6% |
| LiveCodeBench Pass@1 | 93.5 | 88.8 | — | 91.7 |
| Codeforces Rating | 3206 | — | 3168 | 3052 |
| MCPAtlas Public | 73.6 | 73.8 | — | — |
| Terminal-Bench 2.0 | 67.9 | — | 75.1 | 68.5 |
| Toolathlon | 51.8 | — | — | 48.8 |
| HMMT 2026 Feb (math) | 95.2 | 96.2 | — | — |
| HLE (Humanity's Last Exam) | 37.7 | 40.0 | — | — |
| MRCR 1M (long-context recall) | 83.5 | — | — | — |
Where V4-Pro-Max leads:
- LiveCodeBench Pass@1 (highest of any frontier model)
- Codeforces Rating
- Toolathlon
Where V4-Pro-Max is competitive:
- SWE-Bench Verified (within 0.2 points of Opus 4.6)
- MCPAtlas (within 0.2 points of Opus 4.6)
- 1M context retrieval (no closed peer at this length)
Where V4-Pro-Max trails:
- Terminal-Bench 2.0 (GPT-5.4 leads by 7 points)
- HMMT math (Opus 4.6 leads by 1 point)
- HLE (Opus 4.6 leads by 2.3 points)
The honest read: DeepSeek V4 Pro is genuinely frontier-class on coding + agentic tasks, ~6-12 months behind on the hardest reasoning.
The cost angle (the actual story)
V4-Pro: $1.74 / $3.48 per MTok input/output.
For comparison:
| Model | Input | Output | Output ratio to V4-Pro |
|---|---|---|---|
| DeepSeek V4-Flash | $0.14 | $0.28 | 0.08× |
| DeepSeek V4-Pro | $1.74 | $3.48 | 1× |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 4.3× |
| Gemini 3.5 Flash | $0.30 | $2.50 | 0.72× |
| GPT-5.5 | $1.25 | $10.00 | 2.9× |
| Claude Opus 4.7 | $15.00 | $75.00 | 21.6× |
V4-Pro output is 21.6× cheaper than Opus 4.7 at SWE-Bench-equivalent quality on coding workloads. If you're doing high-volume coding-agent work where output token count dominates the bill, V4-Pro changes the economics enough to potentially restructure your stack.
When DeepSeek V4 actually wins
Be precise about the use case:
Winning case 1 — High-volume coding tasks where cost dominates
Pattern: bulk PR review, code-quality classification across thousands of repos, automated test generation at scale. V4-Pro at $3.48/MTok output vs Opus 4.7's $75 is a 21.6× cost compression. The SWE-Bench quality gap (0.2 points) is invisible at the per-task level.
Winning case 2 — Long-context analysis where 200K isn't enough
V4 supports 1M tokens natively without separate pricing tiers (Gemini 2.5 Pro doubles the price above 200K; Anthropic's Sonnet 4.6 has a separate 1M tier at 2× the base rate). For workloads that genuinely need 500K-1M context (full codebase analysis, large legal doc review), V4 is the cheapest path.
Winning case 3 — Open-weight requirement
Regulated environments, audit requirements, or "we need to be able to run this on-prem" mandates that exclude closed models. V4 is MIT- licensed. You can self-host it via vLLM/SGLang/TGI.
Winning case 4 — Pairing in a router
V4-Pro on coding-heavy first-pass, Opus 4.7 on the 10-15% of tasks where the model returns "uncertain" or downstream eval flags low confidence. Typical production saving: 50-70% vs pure-Opus.
When NOT to use V4
Don't use case 1 — Hardest reasoning
If your workload is dominated by math olympiad-style problems, novel proof construction, or PhD-level science reasoning, the 2-3 point HLE
- HMMT gap matters. Stay on Opus 4.7 or wait for Gemini 3.5 Pro.
Don't use case 2 — Multimodal
V4 is text-only. No vision, no audio. If your workload needs vision (GPT-5-style multimodal or Gemini 3.5 Flash native multimodal), V4 doesn't compete.
Don't use case 3 — Tool ecosystem maturity matters more than cost
V4 is well-supported in LangChain, but the agent-framework ecosystem is still primarily built around OpenAI + Anthropic SDKs. If you're shipping fast and ecosystem maturity > 5× cost savings, stay with the closed-model SDKs.
The architecture story
The benchmarks are competitive; what's actually new in V4 is the inference-cost architecture:
- Mixture-of-Experts with 49B active. Out of 1.6T total parameters, only 49B are activated per token. Inference cost (compute + memory) scales with active params, not total — that's how a 1.6T model serves at $1.74/MTok.
- FP4 + FP8 mixed precision. V4-Pro ships in FP4 + FP8 mixed precision rather than the more common BF16. ~50% memory footprint reduction; minor accuracy cost amortised by the larger total parameter count.
- Interleaved thinking. V4 uses an interleaved-thinking pattern
for reasoning-heavy tasks rather than the separate
<thinking>block of earlier reasoners. Lower overhead, faster TTFT, but the pattern is harder to integrate into agent frameworks expecting the classic separator.
How to use V4 today
Three paths:
Path 1: Cloud API
DeepSeek's own API at api.deepseek.com — pay-as-you-go at the $1.74/$3.48 rates. Standard OpenAI-compatible wire format.
Path 2: Self-host
V4-Flash (284B / 13B active) runs on a single 8× H100 node. V4-Pro (1.6T / 49B active) needs 16-32× H100 depending on the quantization. vLLM, SGLang, and TGI all support V4 as of release.
Path 3: Multi-provider gateway
Anvat (the gateway we build) exposes V4-Pro alongside Claude, GPT, and Gemini on a single OpenAI-compatible key. Route per-task: V4 for bulk coding, Opus for reasoning, GPT for multimodal — no key juggling.
What V4 changes about the market
Three implications:
-
The "expensive frontier" tier is now a smaller wedge. Opus 4.7 and GPT-5.5 keep their lead on the absolute hardest tasks. Most production AI workloads don't need that ceiling. V4-Pro covers the 80% of the value distribution at 1/20th the cost.
-
Open weights at frontier quality is a 2026 fact. Two years ago open-source meant "trail closed models by 12 months." V4 trails by ~6 months on hardest tasks, leads on coding. The gap is now small enough that "use the open model" is a defensible default for most workloads.
-
Pricing pressure on the closed labs increases. When V4-Pro delivers near-Opus quality at $3.48/MTok output, the Opus 4.7 $75/MTok price becomes harder to justify for any workload that isn't doing genuinely hardest-tier reasoning. Expect closed-model pricing to compress in the second half of 2026.
Related coverage
Run V4-Pro alongside Claude, GPT, and Gemini on one key
Anvat is OpenAI- and Anthropic-compatible. DeepSeek V4-Pro routes through the same /v1/chat/completions endpoint as every other model — no per-provider integration. 30% off list price.
Try free → →