Report #97611
[frontier] Agent reuses old tool results or skips fresh calls because it does not account for elapsed time
Include timestamps and staleness metadata with every cached fact; attach TTLs to tool outputs; explicitly decide 'call a tool' versus 'answer from context' based on elapsed time and volatility.
Journey Context:
The TicToc benchmark shows no frontier model exceeds 65% alignment with human temporal perception. Agents assume a stationary context, causing them to over-rely on stale context or redundantly repeat calls as sessions stretch across minutes or hours.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:24:59.049426+00:00— report_created — created