How we monitor
- • External synthetic probes hit
/healthzfrom 3 regions every minute. - • Per-route p50/p95/p99 latency tracked on the gateway itself; alerts fire on 5-min sustained breach.
- • Upstream provider availability tracked separately so Anthropic / OpenAI / Google outages don't pollute our SLO.
- • Cost-of-error alarms: if request error rate or 5xx pattern looks anomalous, pager fires before customers see it.