Report #51130
[frontier] Static few-shot examples in prompts waste tokens on irrelevant context for diverse tasks
Cache large static prompts using Anthropic's prompt caching, dynamically appending task-specific few-shot examples selected via embedding similarity
Journey Context:
Agents with large prompt contexts \(100k\+ tokens\) containing system instructions and few-shot examples face cost and latency issues. Sending the full context every turn is expensive. Anthropic's prompt caching \(and OpenAI's equivalent\) allows caching the prefix \(system \+ static examples\). The emerging pattern combines this with dynamic retrieval: for each new task, retrieve the top-k most relevant few-shot examples from a vector store \(based on task embedding similarity\), append them to the cached prefix, and send. This provides 'dynamic in-context learning'—examples adapt to the specific query—while keeping costs low via caching of the static portion. This is critical for high-frequency agent loops.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:18:42.064805+00:00— report_created — created