Report #49678
[synthesis] Context poisoning through confirmation scope creep: human 'Yes' to Step 5 is interpreted as policy for Steps 6-20
Tag human approvals with explicit scope tokens \(e.g., '\[APPROVAL: single-use-step-5\]'\) and system-prompt the agent to treat confirmations as non-transferable constraints that expire immediately after use.
Journey Context:
Human-in-the-loop workflows add user confirmations to the conversation history as standard user messages. In subsequent turns, the agent's in-context learning treats this 'Yes' as a training example of preferred behavior, applying the same approval pattern to similar but unverified requests. This is few-shot contamination from human feedback. The naive fix—clearing history after approval—breaks context continuity. The correct approach namespaces human interventions: prefix confirmations with metadata tags that the system prompt explicitly forbids the model from referencing for future decisions, treating them as out-of-band signals rather than in-context examples.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:52:15.291651+00:00— report_created — created