Vertex AI vs AWS Bedrock vs Anthropic direct: the real differences for Claude in 2026

Anthropic publishes Claude on three first-party surfaces: the Anthropic API, Google Vertex AI, and AWS Bedrock. They mostly do the same thing — same model weights, similar pricing, similar latency. But the gaps that exist are exactly the gaps that bite you in production.

This is the honest 2026 comparison.

The three providers at a glance

Dimension	Anthropic direct	Vertex AI	AWS Bedrock
Auth model	API key	Google IAM + service accounts	AWS IAM + signed requests
Pricing parity	Baseline	~1.05× on most models	~1.05× on most models
Region availability	US (primary), EU (limited)	Global (15+ regions)	Global (10+ regions)
Cache TTL	5min default, 1hr beta	5min default, 1hr beta	5min default, 1hr beta
New-model availability	Day 0	2-6 weeks behind	2-6 weeks behind
Tool use	Full support	Full support	Full support
Vision input	Full support	Full support	Full support
Audio (Opus 4.8)	Not yet	Not yet	Not yet
Batch API	Yes, 50% off	Yes, 50% off	Yes, 50% off
Streaming	SSE	SSE	SSE
Rate limits	Per Org	Per project	Per account
Free tier	$5	Limited	None

Why each one exists

Anthropic direct

The reference implementation. Fastest to get new models. Cheapest base price. Smallest operational footprint — one API key, no IAM dance.

The downside: Anthropic operates in a limited number of regions. If your application's data needs to stay in EU/APAC/Brazil/etc, your choice is either accept the cross-region latency or use one of the hyperscaler-hosted versions below.

Google Vertex AI

You'd pick Vertex if:

Your company is GCP-first and your security team needs everything behind Google IAM.
You need a specific Google Cloud region (e.g. asia-southeast1 in Singapore for SE Asia latency).
You're already getting volume discounts from Google and want Claude to roll up into that bill.
You need Vertex's "Customer-Managed Encryption Keys" (CMEK) for regulatory compliance.

The downside: new Claude models arrive 2-6 weeks late, the SDK is a GCP SDK (not the Anthropic SDK), and authentication requires the full GCP service-account dance which doesn't translate well to small deployments.

AWS Bedrock

You'd pick Bedrock if:

Your company is AWS-first and your security team needs everything behind AWS IAM.
You need a specific AWS region for data residency.
You want to use Bedrock's Knowledge Bases / Agents / Guardrails features (Anthropic's own surfaces don't have direct equivalents).
You're spending enough on AWS that EDP credits cover the LLM bill.

The downside: same as Vertex — new model lag, AWS-specific SDK, auth model that doesn't suit small services.

Real-world pricing comparison

A 1M-input, 100K-output token workload, no caching, Claude Sonnet 4.6:

Provider	Cost	vs baseline
Anthropic direct	$4.50	baseline
AWS Bedrock (us-east-1)	$4.50	parity
Vertex AI (us-central1)	$4.50	parity
Bedrock + 1-year provisioned throughput	varies	discount available
Anvat (Anthropic backend)	$3.15	-30%

For headline pricing, the three first-party options are basically tied. Where they diverge:

Bedrock Provisioned Throughput offers committed-usage discounts — if you have steady predictable load, you can get 30-50% off in exchange for committing to N tokens/sec for 1+ months. Operationally heavy.
Vertex Volume Discounts kick in at very high spend tiers (typically $50K+/mo).
Anvat discount applies from token #1, no commitment.

Feature parity gotchas

The "parity" claim is mostly true but the exceptions matter.

Prompt caching

All three support cache_control as of mid-2026. Vertex was last to launch; Bedrock landed it ~3 months after direct. If you're targeting EU Vertex regions specifically, double-check caching is GA in your region — it sometimes ships region-by-region.

Tool use

Full feature parity. Same wire format on all three. No issues reported in production.

Vision

Full feature parity for static images. Anthropic-direct supports slightly higher input image counts per request; the hyperscaler versions cap lower in some regions.

Computer use / agentic features

Direct Anthropic gets these first by months. Bedrock and Vertex typically follow but lag.

New models

Day-0 access for Opus / Sonnet / Haiku launches is direct-only. Bedrock and Vertex typically take 2-6 weeks. For coding-agent workloads where the latest Opus drop is a meaningful productivity bump, the lag matters.

Latency considerations

In rough numbers (TTFT — time to first token):

Anthropic direct (US) — 250-400ms median
Anthropic direct (cross-Atlantic) — 400-700ms median
Bedrock (us-east-1) — 280-450ms median
Bedrock (eu-west-1) — 320-500ms median (regional)
Vertex (us-central1) — 290-440ms median
Vertex (europe-west1) — 330-510ms median (regional)

For latency-sensitive applications in non-US regions, the hyperscaler versions are noticeably better. For US-based services, the difference is within noise.

Operational considerations

Auth complexity

Anthropic direct: one API key in an env var. Done in 30 seconds.

Bedrock: AWS access key + secret key OR an IAM role (preferred for production). SDK handles SigV4 signing. Add ~2 hours of glue code for first-time setup.

Vertex: GCP service account JSON file (or workload identity in GKE). SDK handles auth refresh. Similar ~2 hours of glue code.

Quota management

Direct: org-wide rate limits, lifted via support.

Bedrock/Vertex: per-project or per-account quotas, lifted via cloud console. Generally easier to scale than direct in mature accounts; harder to scale than direct in new accounts (cloud providers have trust-history rate limits).

Billing visibility

Direct: dashboard at console.anthropic.com.

Bedrock: rolls into AWS bill, visible in Cost Explorer with model-level tagging.

Vertex: rolls into GCP bill, visible in Billing reports.

For finance teams that want LLM spend rolled into existing cloud contracts, Bedrock/Vertex wins. For finance teams that want a separate line item, direct wins.

When to use what

Situation	Use
Solo developer, US-based	Direct (or Anvat for discount)
Startup, no compliance constraint	Direct or Anvat
Series A+ with security review	Bedrock or Vertex (whichever matches your cloud)
Regulated industry (HIPAA, FINRA, PCI)	Bedrock or Vertex with the relevant compliance pack
EU/APAC users needing low latency	Bedrock or Vertex regional, OR Anvat (CDN-fronted)
Highest model availability priority	Direct
Already spending $$$ on AWS/GCP	Bedrock/Vertex (consolidates bill)
Maximum cost optimisation	Anvat (-30%) + caching

The gateway option

You don't have to pick one. A gateway like Anvat sits in front of Anthropic direct, exposes the same wire format, applies a 30% discount, and works for everyone who'd otherwise use direct.

Bedrock/Vertex don't pass through gateways — by design. They're hyperscaler-managed services with their own auth model. If you need hyperscaler-billed access for compliance reasons, the gateway saving isn't on the table.

For everyone else, gateway > direct for cost.

Bottom line

Default: Anthropic direct, or Anvat for the same thing at -30%. Smallest operational footprint, fastest new-model access.
Compliance / data-residency driven: Bedrock or Vertex — match whichever cloud you already trust. Pay the parity premium for the governance.
Latency-driven and not on US East: regional Bedrock or Vertex.
Cost-optimisation driven: Anvat with prompt caching. No commitment, no operational tax, ~70% off list with caching stacked.

Cheap Claude API in 2026 → Claude API pricing breakdown →

Get Claude direct-Anthropic quality at 30% off

Anvat is the discounted Anthropic-compatible gateway — same wire format, same models, same caching, 30% less cost. Day-0 model availability.

Try free → →