Anvat
claudevertexbedrockinfrastructure

Vertex AI vs AWS Bedrock vs Anthropic direct: the real differences for Claude in 2026

Side-by-side comparison of accessing Claude through Google Vertex AI, AWS Bedrock, and Anthropic direct API — pricing, latency, region availability, feature parity, and when each makes sense.

Anvat team7 min read

Anthropic publishes Claude on three first-party surfaces: the Anthropic API, Google Vertex AI, and AWS Bedrock. They mostly do the same thing — same model weights, similar pricing, similar latency. But the gaps that exist are exactly the gaps that bite you in production.

This is the honest 2026 comparison.

The three providers at a glance

DimensionAnthropic directVertex AIAWS Bedrock
Auth modelAPI keyGoogle IAM + service accountsAWS IAM + signed requests
Pricing parityBaseline~1.05× on most models~1.05× on most models
Region availabilityUS (primary), EU (limited)Global (15+ regions)Global (10+ regions)
Cache TTL5min default, 1hr beta5min default, 1hr beta5min default, 1hr beta
New-model availabilityDay 02-6 weeks behind2-6 weeks behind
Tool useFull supportFull supportFull support
Vision inputFull supportFull supportFull support
Audio (Opus 4.8)Not yetNot yetNot yet
Batch APIYes, 50% offYes, 50% offYes, 50% off
StreamingSSESSESSE
Rate limitsPer OrgPer projectPer account
Free tier$5LimitedNone

Why each one exists

Anthropic direct

The reference implementation. Fastest to get new models. Cheapest base price. Smallest operational footprint — one API key, no IAM dance.

The downside: Anthropic operates in a limited number of regions. If your application's data needs to stay in EU/APAC/Brazil/etc, your choice is either accept the cross-region latency or use one of the hyperscaler-hosted versions below.

Google Vertex AI

You'd pick Vertex if:

  • Your company is GCP-first and your security team needs everything behind Google IAM.
  • You need a specific Google Cloud region (e.g. asia-southeast1 in Singapore for SE Asia latency).
  • You're already getting volume discounts from Google and want Claude to roll up into that bill.
  • You need Vertex's "Customer-Managed Encryption Keys" (CMEK) for regulatory compliance.

The downside: new Claude models arrive 2-6 weeks late, the SDK is a GCP SDK (not the Anthropic SDK), and authentication requires the full GCP service-account dance which doesn't translate well to small deployments.

AWS Bedrock

You'd pick Bedrock if:

  • Your company is AWS-first and your security team needs everything behind AWS IAM.
  • You need a specific AWS region for data residency.
  • You want to use Bedrock's Knowledge Bases / Agents / Guardrails features (Anthropic's own surfaces don't have direct equivalents).
  • You're spending enough on AWS that EDP credits cover the LLM bill.

The downside: same as Vertex — new model lag, AWS-specific SDK, auth model that doesn't suit small services.

Real-world pricing comparison

A 1M-input, 100K-output token workload, no caching, Claude Sonnet 4.6:

ProviderCostvs baseline
Anthropic direct$4.50baseline
AWS Bedrock (us-east-1)$4.50parity
Vertex AI (us-central1)$4.50parity
Bedrock + 1-year provisioned throughputvariesdiscount available
Anvat (Anthropic backend)$3.15-30%

For headline pricing, the three first-party options are basically tied. Where they diverge:

  • Bedrock Provisioned Throughput offers committed-usage discounts — if you have steady predictable load, you can get 30-50% off in exchange for committing to N tokens/sec for 1+ months. Operationally heavy.
  • Vertex Volume Discounts kick in at very high spend tiers (typically $50K+/mo).
  • Anvat discount applies from token #1, no commitment.

Feature parity gotchas

The "parity" claim is mostly true but the exceptions matter.

Prompt caching

All three support cache_control as of mid-2026. Vertex was last to launch; Bedrock landed it ~3 months after direct. If you're targeting EU Vertex regions specifically, double-check caching is GA in your region — it sometimes ships region-by-region.

Tool use

Full feature parity. Same wire format on all three. No issues reported in production.

Vision

Full feature parity for static images. Anthropic-direct supports slightly higher input image counts per request; the hyperscaler versions cap lower in some regions.

Computer use / agentic features

Direct Anthropic gets these first by months. Bedrock and Vertex typically follow but lag.

New models

Day-0 access for Opus / Sonnet / Haiku launches is direct-only. Bedrock and Vertex typically take 2-6 weeks. For coding-agent workloads where the latest Opus drop is a meaningful productivity bump, the lag matters.

Latency considerations

In rough numbers (TTFT — time to first token):

  • Anthropic direct (US) — 250-400ms median
  • Anthropic direct (cross-Atlantic) — 400-700ms median
  • Bedrock (us-east-1) — 280-450ms median
  • Bedrock (eu-west-1) — 320-500ms median (regional)
  • Vertex (us-central1) — 290-440ms median
  • Vertex (europe-west1) — 330-510ms median (regional)

For latency-sensitive applications in non-US regions, the hyperscaler versions are noticeably better. For US-based services, the difference is within noise.

Operational considerations

Auth complexity

Anthropic direct: one API key in an env var. Done in 30 seconds.

Bedrock: AWS access key + secret key OR an IAM role (preferred for production). SDK handles SigV4 signing. Add ~2 hours of glue code for first-time setup.

Vertex: GCP service account JSON file (or workload identity in GKE). SDK handles auth refresh. Similar ~2 hours of glue code.

Quota management

Direct: org-wide rate limits, lifted via support.

Bedrock/Vertex: per-project or per-account quotas, lifted via cloud console. Generally easier to scale than direct in mature accounts; harder to scale than direct in new accounts (cloud providers have trust-history rate limits).

Billing visibility

Direct: dashboard at console.anthropic.com.

Bedrock: rolls into AWS bill, visible in Cost Explorer with model-level tagging.

Vertex: rolls into GCP bill, visible in Billing reports.

For finance teams that want LLM spend rolled into existing cloud contracts, Bedrock/Vertex wins. For finance teams that want a separate line item, direct wins.

When to use what

SituationUse
Solo developer, US-basedDirect (or Anvat for discount)
Startup, no compliance constraintDirect or Anvat
Series A+ with security reviewBedrock or Vertex (whichever matches your cloud)
Regulated industry (HIPAA, FINRA, PCI)Bedrock or Vertex with the relevant compliance pack
EU/APAC users needing low latencyBedrock or Vertex regional, OR Anvat (CDN-fronted)
Highest model availability priorityDirect
Already spending $$$ on AWS/GCPBedrock/Vertex (consolidates bill)
Maximum cost optimisationAnvat (-30%) + caching

The gateway option

You don't have to pick one. A gateway like Anvat sits in front of Anthropic direct, exposes the same wire format, applies a 30% discount, and works for everyone who'd otherwise use direct.

Bedrock/Vertex don't pass through gateways — by design. They're hyperscaler-managed services with their own auth model. If you need hyperscaler-billed access for compliance reasons, the gateway saving isn't on the table.

For everyone else, gateway > direct for cost.

Bottom line

  • Default: Anthropic direct, or Anvat for the same thing at -30%. Smallest operational footprint, fastest new-model access.
  • Compliance / data-residency driven: Bedrock or Vertex — match whichever cloud you already trust. Pay the parity premium for the governance.
  • Latency-driven and not on US East: regional Bedrock or Vertex.
  • Cost-optimisation driven: Anvat with prompt caching. No commitment, no operational tax, ~70% off list with caching stacked.

Cheap Claude API in 2026 → Claude API pricing breakdown →

Get Claude direct-Anthropic quality at 30% off

Anvat is the discounted Anthropic-compatible gateway — same wire format, same models, same caching, 30% less cost. Day-0 model availability.

Try free →