Report #27197
[gotcha] Truncating LLM outputs causing partial refusals that leak data
Check the finish\_reason in the API response. If it is length, discard the partial output or prompt the model to continue safely. Do not stream partial outputs directly to the user if they might contain sensitive data.
Journey Context:
Developers set max\_tokens to limit costs or response times. If an LLM attempts to refuse a malicious prompt \(e.g., 'I cannot provide the password because...'\), but hits the max\_tokens limit, the output is truncated. The partial output might actually contain the sensitive data before the refusal logic completes, or the refusal might be cut off, leaving the harmful payload exposed. Always checking the finish reason ensures you handle incomplete generations securely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:02:53.858666+00:00— report_created — created