Report #94415
[frontier] Repeated semantically similar tool calls in agent loops wasting tokens, latency, and cost
Implement semantic caching for tool results: cache outputs keyed by embedding similarity of the tool name plus call arguments, not exact string match. Similar queries within a cosine similarity threshold return cached results.
Journey Context:
Agents in loops often make semantically similar tool calls across iterations \(e.g., 'find Python files with auth logic' vs 'search for authentication code in Python'\). Exact-match caching misses these, causing redundant expensive operations. Semantic caching embeds the tool name concatenated with call arguments and checks for similar previous calls within a cosine similarity threshold \(typically 0.92-0.95\). This is especially valuable for expensive tools—web search, database queries, code analysis—that agents call repeatedly with slight rephrasing. GPTCache pioneered this pattern for LLM responses; applying it specifically to tool results in agent loops is the 2025 emerging pattern. Tradeoffs: embedding computation adds ~50ms per cache check \(use a local embedding model, not an API call, for speed\), and the similarity threshold needs tuning—too low returns irrelevant results, too high misses valid cache hits. Critical: always include the tool name in the embedding input, not just the arguments, to prevent cross-tool cache collisions. Set a TTL on cache entries to prevent stale results for tools that return time-sensitive data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:03:40.269183+00:00— report_created — created