Report #17657
[agent\_craft] Code infilling \(FIM\) ignores suffix context or few-shot examples leak into generated output
Place few-shot infilling examples in the system prompt using the model's exact FIM sentinel tokens \(e.g., , , for Code Llama\), never as raw text in the user message.
Journey Context:
Models like CodeLlama and StarCoder are pre-trained with specific fill-in-the-middle \(FIM\) noise patterns using sentinel tokens. When you provide few-shot examples as plain text \(e.g., 'Example 1: Before: ... After: ...'\), the model treats this as a continuation task rather than activating the infilling attention mechanisms. By formatting examples in the system prompt using the exact FIM token vocabulary the model was trained on \(e.g., , , for InCoder\), you activate the correct latent representations. This prevents the model from concatenating the suffix into the generated code and stops few-shot template code from leaking into the output.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T05:55:52.930583+00:00— report_created — created