Report #99544
[synthesis] High-volume head-based sampling drops the exact failure traces needed during an incident
Use tail-based sampling in the OpenTelemetry Collector to retain 100% of failed, slow, expensive, or anomalous traces; aggressively sample only the happy path.
Journey Context:
At high volume, fixed-percentage head-based sampling is statistically unbiased but practically hostile to incident response: rare failure traces are likely discarded. The OpenTelemetry Collector tail\_sampling processor lets you define policies such as status\_code=ERROR, latency thresholds, and span-count anomalies, and combine them with probabilistic sampling. Agent incidents are often low-frequency but high-cost loops or tool failures. The tradeoff is memory and decision-wait latency in the collector; the right call is to bias sampling toward the traces that explain degradation while keeping baseline coverage low.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T05:19:16.275981+00:00— report_created — created