Report #24996
[counterintuitive] Model fails to strictly deduplicate a list of items or perform exact set intersections
Pass the lists to a Python script to perform set operations. Do not ask the LLM to deduplicate text items in its head.
Journey Context:
Deduplication looks trivial: just check if you've seen it before. But LLMs generate tokens based on probability, and if an item is highly probable in the context, the model will often generate it again. It does not maintain a hash set of previously generated items. Without an external memory store to check membership, exact deduplication degrades as list size grows.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:21:43.681278+00:00— report_created — created