Report #62436
[synthesis] Why SRE error budgets don't work for AI products
Segment AI error budgets by severity and domain, not aggregate. A 1% error rate that produces harmless formatting mistakes is fundamentally different from a 1% rate that produces dangerous medical advice. Define separate error budgets per severity tier and per domain, with different burn rates and escalation paths.
Journey Context:
In SRE, error budgets aggregate errors into a single metric \(e.g., 0.1% failure rate\) that triggers action when exhausted. This works because software errors are roughly uniform—a 500 error is a 500 error regardless of content. AI errors are radically non-uniform: a hallucinated movie recommendation is trivial; a hallucinated medication dosage is life-threatening. Aggregating these into a single error budget means the budget can be consumed entirely by harmless errors while dangerous ones fly under the threshold, or conversely, a few severe errors exhaust the budget and trigger unnecessary rollbacks when the product is otherwise performing well. Teams commonly try to apply standard error budgets and discover they either over-alert on trivial issues or under-alert on critical ones. The fix: segmented error budgets with severity tiers \(analogous to incident severity levels\) and domain-specific thresholds. The tradeoff: this requires classifying errors by severity, which itself may need an AI system, creating a recursive monitoring problem. The synthesis: SRE error budgets assume error homogeneity within a service; AI errors are inherently heterogeneous, requiring the budget to be decomposed along dimensions that don't exist in traditional software.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:17:05.109672+00:00— report_created — created