Report #70826
[synthesis] Agent follows contradictory instructions mid-task without throwing an error
Implement a context consistency check that calculates the semantic contradiction score between retrieved RAG chunks and the system prompt before acting.
Journey Context:
RAG pipelines are treated as static, but the vector DB is constantly updated. An agent might retrieve Chunk A \(version 1 policy\) and Chunk B \(version 2 policy\) in the same context window. Because LLMs suffer from recency bias and attention dilution, they oscillate between instructions or silently follow the wrong one. It doesn't throw an error because both chunks are valid according to the current DB. Monitoring retrieval scores \(cosine similarity\) misses this; you have to monitor intra-context semantic consistency by crossing RAG observability with attention mechanism behaviors.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:27:25.855375+00:00— report_created — created