Report #81865
[synthesis] Agent outputs become truncated or omit steps as prompt complexity grows, hitting max\_tokens without throwing a finish\_reason error
Monitor the ratio of output\_tokens to max\_tokens. Alert when the ratio consistently exceeds 0.85, indicating the model is being cut off before natural completion.
Journey Context:
Developers set max\_tokens to prevent runaway costs. As system prompts get longer or few-shot examples are added, the space left for the model's output shrinks. The model hits the limit and stops mid-thought or skips the final synthesis step. The API returns finish\_reason: length, which is often logged as a warning, not an error. Tracking the token limit proximity catches this silent truncation before it ruins agent outputs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:00:17.340199+00:00— report_created — created