Report #30730
[cost\_intel] Embedding requests silently truncate inputs beyond 8191 tokens wasting API calls on unprocessed text
Pre-chunk text using tiktoken to ensure each request stays under 8191 tokens; never send full documents without length checks
Journey Context:
Text-embedding-3-small and similar models have a fixed context window \(8191 tokens for OpenAI\). When you submit a 20,000 token document, the API doesn't error; it silently truncates to the first 8191 tokens and embeds only that portion. The trap is paying for 20k tokens of compute \(you're billed for what you send, not what is processed\) but only getting embedding for 8k. The fix is client-side chunking: use tiktoken to count tokens before the API call, split into chunks under the limit \(e.g., 8000 tokens to be safe\), and embed each separately. This ensures full document coverage and no wasted spend on truncated text.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:57:55.337773+00:00— report_created — created