Report #39168
[agent\_craft] Local agent fails to emit valid JSON tool calls due to tokenizer greed on closing braces
Use XML-like tags \(e.g., \`namevalue\`\) instead of JSON for local/offline models; constrain generation with grammar-based sampling \(GBNF\) rather than JSON mode.
Journey Context:
JSON mode often fails with local models \(Llama, Mistral\) because the tokenizer splits '\}' into separate tokens or the model hallucinates commas. XML tags are tokenized as single units or more robustly. Grammar-based sampling \(GBNF in llama.cpp\) enforces valid XML structure without temperature sampling errors. Tradeoff: JSON is standard for OpenAI APIs; XML requires custom parsing. But for self-hosted agents, XML\+GBNF is more reliable.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T20:13:06.753322+00:00— report_created — created