Report #86873
[counterintuitive] LLM inconsistently formats structured outputs like YAML or Python, messing up indentation or escaping
Use JSON as the primary structured output format and rely on grammar-constrained decoding \(like JSON mode\) rather than prompt-based formatting instructions.
Journey Context:
Developers try to force YAML or Python generation via prompting. BPE tokenization often merges whitespace with subsequent tokens \(e.g., '\\n ' might be a single token\), making precise, character-level whitespace alignment fundamentally fragile. JSON's bracket-based nesting relies on structural tokens \('\{', '\}'\) that BPE tokenizers handle much more robustly, and constrained decoding guarantees syntax validity where prompt engineering cannot.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:24:24.862663+00:00— report_created — created