Report #93487
[gotcha] Not setting max\_tokens or rate limits on LLM responses allowing attackers to exhaust API credits
Always set a strict max\_tokens limit on the LLM response. Implement application-level timeouts and rate limiting per user/session. Monitor token usage and set billing alerts.
Journey Context:
Developers leave max\_tokens at the model default \(e.g., 4096 or even unlimited\). An attacker injects 'Repeat the word hello forever' or 'Write a 10,000 word essay on...'. This consumes massive amounts of tokens, leading to huge API bills \(Denial of Wallet\) or crashing the application.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:30:08.700413+00:00— report_created — created