Anthropic publishes Claude on three first-party surfaces: the Anthropic API, Google Vertex AI, and AWS Bedrock. They mostly do the same thing — same model weights, similar pricing, similar latency. But the gaps that exist are exactly the gaps that bite you in production.
This is the honest 2026 comparison.
The three providers at a glance
| Dimension | Anthropic direct | Vertex AI | AWS Bedrock |
|---|---|---|---|
| Auth model | API key | Google IAM + service accounts | AWS IAM + signed requests |
| Pricing parity | Baseline | ~1.05× on most models | ~1.05× on most models |
| Region availability | US (primary), EU (limited) | Global (15+ regions) | Global (10+ regions) |
| Cache TTL | 5min default, 1hr beta | 5min default, 1hr beta | 5min default, 1hr beta |
| New-model availability | Day 0 | 2-6 weeks behind | 2-6 weeks behind |
| Tool use | Full support | Full support | Full support |
| Vision input | Full support | Full support | Full support |
| Audio (Opus 4.8) | Not yet | Not yet | Not yet |
| Batch API | Yes, 50% off | Yes, 50% off | Yes, 50% off |
| Streaming | SSE | SSE | SSE |
| Rate limits | Per Org | Per project | Per account |
| Free tier | $5 | Limited | None |
Why each one exists
Anthropic direct
The reference implementation. Fastest to get new models. Cheapest base price. Smallest operational footprint — one API key, no IAM dance.
The downside: Anthropic operates in a limited number of regions. If your application's data needs to stay in EU/APAC/Brazil/etc, your choice is either accept the cross-region latency or use one of the hyperscaler-hosted versions below.
Google Vertex AI
You'd pick Vertex if:
- Your company is GCP-first and your security team needs everything behind Google IAM.
- You need a specific Google Cloud region (e.g. asia-southeast1 in Singapore for SE Asia latency).
- You're already getting volume discounts from Google and want Claude to roll up into that bill.
- You need Vertex's "Customer-Managed Encryption Keys" (CMEK) for regulatory compliance.
The downside: new Claude models arrive 2-6 weeks late, the SDK is a GCP SDK (not the Anthropic SDK), and authentication requires the full GCP service-account dance which doesn't translate well to small deployments.
AWS Bedrock
You'd pick Bedrock if:
- Your company is AWS-first and your security team needs everything behind AWS IAM.
- You need a specific AWS region for data residency.
- You want to use Bedrock's Knowledge Bases / Agents / Guardrails features (Anthropic's own surfaces don't have direct equivalents).
- You're spending enough on AWS that EDP credits cover the LLM bill.
The downside: same as Vertex — new model lag, AWS-specific SDK, auth model that doesn't suit small services.
Real-world pricing comparison
A 1M-input, 100K-output token workload, no caching, Claude Sonnet 4.6:
| Provider | Cost | vs baseline |
|---|---|---|
| Anthropic direct | $4.50 | baseline |
| AWS Bedrock (us-east-1) | $4.50 | parity |
| Vertex AI (us-central1) | $4.50 | parity |
| Bedrock + 1-year provisioned throughput | varies | discount available |
| Anvat (Anthropic backend) | $3.15 | -30% |
For headline pricing, the three first-party options are basically tied. Where they diverge:
- Bedrock Provisioned Throughput offers committed-usage discounts — if you have steady predictable load, you can get 30-50% off in exchange for committing to N tokens/sec for 1+ months. Operationally heavy.
- Vertex Volume Discounts kick in at very high spend tiers (typically $50K+/mo).
- Anvat discount applies from token #1, no commitment.
Feature parity gotchas
The "parity" claim is mostly true but the exceptions matter.
Prompt caching
All three support cache_control as of mid-2026. Vertex was last to launch; Bedrock landed it ~3 months after direct. If you're targeting EU Vertex regions specifically, double-check caching is GA in your region — it sometimes ships region-by-region.
Tool use
Full feature parity. Same wire format on all three. No issues reported in production.
Vision
Full feature parity for static images. Anthropic-direct supports slightly higher input image counts per request; the hyperscaler versions cap lower in some regions.
Computer use / agentic features
Direct Anthropic gets these first by months. Bedrock and Vertex typically follow but lag.
New models
Day-0 access for Opus / Sonnet / Haiku launches is direct-only. Bedrock and Vertex typically take 2-6 weeks. For coding-agent workloads where the latest Opus drop is a meaningful productivity bump, the lag matters.
Latency considerations
In rough numbers (TTFT — time to first token):
- Anthropic direct (US) — 250-400ms median
- Anthropic direct (cross-Atlantic) — 400-700ms median
- Bedrock (us-east-1) — 280-450ms median
- Bedrock (eu-west-1) — 320-500ms median (regional)
- Vertex (us-central1) — 290-440ms median
- Vertex (europe-west1) — 330-510ms median (regional)
For latency-sensitive applications in non-US regions, the hyperscaler versions are noticeably better. For US-based services, the difference is within noise.
Operational considerations
Auth complexity
Anthropic direct: one API key in an env var. Done in 30 seconds.
Bedrock: AWS access key + secret key OR an IAM role (preferred for production). SDK handles SigV4 signing. Add ~2 hours of glue code for first-time setup.
Vertex: GCP service account JSON file (or workload identity in GKE). SDK handles auth refresh. Similar ~2 hours of glue code.
Quota management
Direct: org-wide rate limits, lifted via support.
Bedrock/Vertex: per-project or per-account quotas, lifted via cloud console. Generally easier to scale than direct in mature accounts; harder to scale than direct in new accounts (cloud providers have trust-history rate limits).
Billing visibility
Direct: dashboard at console.anthropic.com.
Bedrock: rolls into AWS bill, visible in Cost Explorer with model-level tagging.
Vertex: rolls into GCP bill, visible in Billing reports.
For finance teams that want LLM spend rolled into existing cloud contracts, Bedrock/Vertex wins. For finance teams that want a separate line item, direct wins.
When to use what
| Situation | Use |
|---|---|
| Solo developer, US-based | Direct (or Anvat for discount) |
| Startup, no compliance constraint | Direct or Anvat |
| Series A+ with security review | Bedrock or Vertex (whichever matches your cloud) |
| Regulated industry (HIPAA, FINRA, PCI) | Bedrock or Vertex with the relevant compliance pack |
| EU/APAC users needing low latency | Bedrock or Vertex regional, OR Anvat (CDN-fronted) |
| Highest model availability priority | Direct |
| Already spending $$$ on AWS/GCP | Bedrock/Vertex (consolidates bill) |
| Maximum cost optimisation | Anvat (-30%) + caching |
The gateway option
You don't have to pick one. A gateway like Anvat sits in front of Anthropic direct, exposes the same wire format, applies a 30% discount, and works for everyone who'd otherwise use direct.
Bedrock/Vertex don't pass through gateways — by design. They're hyperscaler-managed services with their own auth model. If you need hyperscaler-billed access for compliance reasons, the gateway saving isn't on the table.
For everyone else, gateway > direct for cost.
Bottom line
- Default: Anthropic direct, or Anvat for the same thing at -30%. Smallest operational footprint, fastest new-model access.
- Compliance / data-residency driven: Bedrock or Vertex — match whichever cloud you already trust. Pay the parity premium for the governance.
- Latency-driven and not on US East: regional Bedrock or Vertex.
- Cost-optimisation driven: Anvat with prompt caching. No commitment, no operational tax, ~70% off list with caching stacked.
Cheap Claude API in 2026 → Claude API pricing breakdown →
Get Claude direct-Anthropic quality at 30% off
Anvat is the discounted Anthropic-compatible gateway — same wire format, same models, same caching, 30% less cost. Day-0 model availability.
Try free → →