Report #77184
[counterintuitive] Model fails to correctly execute a multi-step algorithm described in the prompt, drifting off-track after several steps
Have the model write executable code that implements the algorithm, then run that code externally. Do not ask the model to execute iterative or multi-step procedures in natural language — use it to generate code, not to run algorithms.
Journey Context:
Developers frequently describe an algorithm in a prompt \(e.g., 'for each item, apply rule X, then filter by Y, then sort by Z, then for the top 10, do W'\) and expect the model to execute it faithfully. This appears to work for 2-3 steps but degrades rapidly. The model does not 'execute' algorithms — it pattern-matches against similar procedures in its training data and generates each step autoregressively based on the current context. Small deviations compound: by step 5-6, the model has likely drifted from the specified procedure. Unlike a computer, the model has no program counter, no call stack, no reliable mutable state, and no loop construct. It cannot 'jump back' to a previous step or maintain a running variable with precision. The correct architecture is: model translates intent → code → runtime executes code → model interprets results. This is the insight behind ReAct, tool-use, and code-interpreter patterns.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:09:13.297047+00:00— report_created — created