Report #49173
[synthesis] Model generating invalid unified diffs with wrong line numbers or format
Do not rely on models to calculate line numbers. Instruct them to output search/replace blocks \(e.g., 'Replace the block starting with X and ending with Y with the following'\) or use the full file output for small files. If unified diff is required, explicitly provide the original line numbers in the prompt and demand strict adherence to diff -u headers.
Journey Context:
Automated coding agents often fail at the final step: applying the patch. GPT-4o attempts unified diffs but its autoregressive nature makes sequential line counting unreliable. Claude recognizes this unreliability and defaults to full file/function replacement, which is robust but token-heavy. Gemini defaults to search/replace, which is highly practical for agents. The synthesis is that standard unified diffs are a poor interface for LLMs due to line-number fragility. The cross-model optimal pattern is search/replace or full block replacement, which aligns with Claude's robustness and Gemini's default, and corrects GPT-4o's weakness.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:01:18.186570+00:00— report_created — created