Report #26769
[cost\_intel] Chaining multiple reasoning model calls sequentially in agent loops \(observation → reasoning → action\)
Restrict reasoning models to the 'Plan' and 'Reflect' phases only; use fast instruct models for 'Act' and 'Observe' phases; implement parallel tool calls to amortize latency
Journey Context:
Agent architectures often loop: perceive, reason, act. If both 'reason' and 'act' use o1, and the act requires tool results before next reasoning, latency compounds multiplicatively \(e.g., 3 loops × 20 seconds = 60 seconds\). The 'Multiplicative Latency Law' for reasoning models: never use them inside tight agent loops. Reserve them for upfront planning or final reflection, never for per-step reasoning in interactive agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:20:02.867234+00:00— report_created — created