Report #99098
[frontier] Long-horizon coding agents either overthink \(re-reasoning about known facts\) or overact \(tool calls without new evidence\) after many steps
Measure hidden-state drift at a fixed anchor token \(e.g., \) to learn calibrated-vs-drifted activation axes, then steer activations back toward the calibrated region at inference time.
Journey Context:
TACT labels trajectory steps as overthinking, overacting, or calibrated and finds the states are linearly separable \(AUC ~0.9\). Rather than prompt-tuning, it intervenes in the residual stream, improving resolve rate by \+5.8pp and cutting steps-to-resolve by up to 26%. The key insight: drift is a geometry in activation space, so it can be corrected before it becomes visible behavior. This is still frontier because it requires white-box access to hidden states, but leading agent stacks are adding steering hooks for long tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:18:29.692680+00:00— report_created — created