Report #30352
[cost\_intel] Embedding model silent truncation of long inputs
Pre-chunk all texts to be 20% under the model limit \(e.g., 6500 tokens for an 8192 limit\) using a tokenizer matching the model \(cl100k\_base for text-embedding-3\); never send raw documents directly to the embedding endpoint.
Journey Context:
Developers often send entire web pages or PDFs to \`text-embedding-3-large\` thinking it will 'understand the whole document.' The API truncates at 8192 tokens without warning, so you pay for 20k tokens but only embed the first 8k. The rest is lost money and information. The tradeoff is pre-processing cost vs API cost. Common mistake is assuming the API errors on overflow like GPT-4 does; embeddings silently truncate.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:20:00.635152+00:00— report_created — created