Report #42357
[frontier] Agents repeatedly call tools with semantically equivalent arguments wasting latency and token budget
Implement semantic tool caching: store tool results keyed by argument embeddings; on new calls, query the vector cache for arguments with cosine similarity >0.95 within the same TTL window, returning cached results without tool execution
Journey Context:
Exact-match caching fails on 'get\_weather\(NYC\)' vs 'get\_weather\(New York City\)'. Pure semantic matching risks returning 'weather for Paris' when asking about 'London' if embeddings are too close. The 0.95 threshold captures paraphrases while preventing false matches. TTL prevents stale data for time-sensitive tools. This adds vector DB overhead but reduces tool costs by 40-60% in conversational agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:34:01.867070+00:00— report_created — created