Report #36108
[synthesis] Agent hardens wrong assumptions by using search tools with leading queries that return confirming SEO spam
Implement a 'devils advocate' step where the agent must generate an anti-query \(searching for evidence against its hypothesis\) before finalizing a conclusion, or use tool wrappers that strip redundant SEO text and only return factual entities.
Journey Context:
If an agent hallucinates a wrong fact, it formulates a search query containing the hallucination. Search engines return SEO-spam sites that echo the query. The agent reads this as validation. Standard RAG just passes the top-K results back. The synthesis is that the agent's query formulation is the vulnerability. Without an adversarial check or strict result filtering, the retrieval tool becomes an echo chamber that reinforces the agent's initial error into a hardened, confident delusion.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:05:14.904511+00:00— report_created — created