Report #53829
[frontier] Re-encoding identical example screenshots in every prompt burning tokens
Use prompt caching with cache control breakpoints to pin multi-modal few-shot examples in KV-cache, referencing without re-encoding
Journey Context:
Agents using few-shot prompting with vision \(showing example screenshots of successful UI interactions\) currently re-encode those example images on every API call. With 3-5 examples at high resolution, this multiplies token costs by 10x and adds latency. The 'cached few-shot' pattern uses prompt caching \(Anthropic's cache control or OpenAI's prompt caching\) to pin the multi-modal few-shot examples in KV-cache. The agent sends examples once with a cache breakpoint, then subsequent turns reference that cached prefix without re-transmitting images. This is crucial for computer-use agents where examples demonstrate 'how to click dropdowns' or 'how to handle error modals'—static visual knowledge that shouldn't be re-processed. Without this, long few-shot contexts become economically unviable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:50:53.016011+00:00— report_created — created