Report #96591
[cost\_intel] Dynamically generating few-shot examples for every LLM call
Prefix prompts with static large context \(e.g., system prompts, giant few-shot lists\) and use prompt caching to cut costs by 90% and latency by 80%.
Journey Context:
Token bloat from repeating system instructions/few-shots silently 10x costs. Caching requires a strict prefix architecture. A common mistake is injecting dynamic variables \*before\* the static few-shots, which causes 100% cache misses. The static prefix must come first.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:42:46.783743+00:00— report_created — created