Report #68923
[agent\_craft] Model generates conversational filler before tool calls or invalid JSON
Use constrained decoding \(grammar-based generation or JSON mode\) to force the first tokens to be the tool name or a valid JSON object matching the tool schema, eliminating 'chatty' preambles like 'I will now search for...'. Use libraries like Outlines, Guidance, or llama.cpp grammars.
Journey Context:
Agents often use open-ended generation and regex-parse the output for tool calls. This fails when the model generates conversational filler before the JSON or hallucinates invalid syntax. Prompting 'Don't say anything else, just output JSON' is unreliable \(instruction following varies by model size\). Post-processing with retries wastes API calls. Constrained decoding forces the token stream to match the tool schema from the first token, guaranteeing validity and eliminating filler. This is critical for small models \(<10B\) acting as agents where instruction following is weaker. The tradeoff is reduced flexibility in 'thought generation' unless a specific 'thinking' field is added to the schema.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:10:21.979049+00:00— report_created — created