Anvat
claudemythosopus-4-8benchmarktracking

Claude Mythos benchmark tracker — Opus 4.8 vs Mythos deep-dive (live)

Living comparison of Claude Mythos vs Opus 4.8 on every published benchmark. Currently sparse — Anthropic has released limited public numbers. Updated as data lands.

Anvat team4 min read

Claude Mythos is Anthropic's most capable model to date — gated behind Project Glasswing, accessible only to ~150 vetted organizations as of June 2026. Public benchmark numbers are sparse and will stay sparse until Anthropic moves Mythos toward general availability. This page is the tracker — what's verified, what's claimed, and where it stands relative to publicly-available Opus 4.8.

Published numbers (verified)

What Anthropic has stated publicly about Mythos in the April 7 launch announcement + Glasswing expansion materials:

BenchmarkMythosOpus 4.8 (public)Delta
SWE-Bench Verified93.9%81.5%+12.4
CyberGym (vulnerability reproduction)83.1%~62%+21
Agentic security tasks (internal benchmark)Anthropic-reported "substantially higher"baseline

That's the entire publicly verifiable set as of June 6, 2026. Everything else is claim-without-numbers, leaked guesses, or extrapolation.

Pricing (verified)

  • Mythos (partner pricing via Glasswing): $25 input / $125 output per MTok
  • Opus 4.8 (public): $15 input / $75 output per MTok
  • Opus 4.8 via Anvat: $10.50 input / $52.50 output per MTok (30% off)

So Mythos is ~67% more expensive at list than Opus 4.8 — and ~138% more expensive than Opus 4.8 through Anvat.

For a typical agent turn (5K input + 800 output), the per-request comparison is:

ModelPer requestPer million requests
Mythos (list)$0.225$225K
Opus 4.8 (Anthropic list)$0.135$135K
Opus 4.8 (Anvat)$0.095$95K

Where Mythos is reported to lead

Per Anthropic's launch materials and partner reports (no benchmark publication yet):

  • Multi-step agentic security tasks — the headline use case for Glasswing partners.
  • Long-horizon coding — workflows that need to run 10+ tool calls without losing track of state.
  • Adversarial reasoning — finding novel attack patterns rather than recognizing known ones.

What's NOT yet shown publicly:

  • Humanity's Last Exam (Mythos number not published)
  • ARC-AGI-2 (no number)
  • MRCR long-context recall (no number)
  • HMMT math (no number)
  • Multimodal benchmarks (no number)

Where Opus 4.8 still likely wins

Even without Mythos numbers, the structural factors that favor Opus 4.8 today:

  • Cost. ~50% cheaper at list, ~62% cheaper through Anvat.
  • Availability. Public, no waiting list.
  • Ecosystem. Every Claude Code, Cursor, MCP integration that exists works against Opus 4.8 today. Mythos integrations are per-partner, with separate access agreements.
  • Predictability. Opus 4.8 is in active production at thousands of teams; failure modes are documented. Mythos behavior is preview-stage even for Glasswing partners.

What we expect to update

When Anthropic releases Mythos benchmark numbers — likely tied to a GA announcement or the GA-equivalent "Mythos 1" public release — we'll fill in:

  • Per-benchmark Mythos scores
  • Public access path + pricing tier(s)
  • Anvat support timeline (we will day-0 add Mythos to the gateway when API access is publicly available)

Watch this page for the update.

The honest framing for builders

Until Mythos is publicly available, Opus 4.8 is the right default for the workloads Mythos was designed for. The Zcash bug discovery from May 29, 2026 happened on publicly available Opus 4.8 — not Mythos. The capability ceiling on Opus 4.8 is already high enough to find real security findings in production protocols. Mythos raises the ceiling further, but the practical workflow doesn't change.

If your team needs Mythos-class capability today and you can pass Glasswing vetting, apply via the Claude Console. If you can't or don't want to wait, Opus 4.8 through Anvat at 30% off is the path that ships now.

Run Opus 4.8 at 30% off list while you wait for Mythos

Anvat is Anthropic-compatible. Day-0 Mythos support when public API drops. Two env vars to switch — same SDK, lower bill.

Try free →