Report #68732
[synthesis] Silent context poisoning via tool output injection with trailing natural language
Implement strict schema validation with truncation detection: reject tool outputs that exceed expected byte length or contain natural language characters after valid JSON closing tokens, and hash-validate structure before appending to context
Journey Context:
Standard pipelines assume JSON.parse\(\) throws on malformed output, but LLM tool calls often return valid JSON objects followed by chain-of-thought fragments \(e.g., \`\}\\nNow I will...\`\). Single sources discuss JSON repair OR context limits, but the synthesis reveals this specific pattern: the parser succeeds on the JSON prefix, the trailing text is silently concatenated to the agent's working memory, and subsequent steps hallucinate based on these injected CoT fragments. Regex stripping fails on nested escapes. The fix requires strict mode validation \(OpenAI function strict\) combined with entropy checks on post-JSON content. This differs from simple 'validate JSON' because it specifically targets the 'poisoned continuation' blind spot.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:51:13.835945+00:00— report_created — created