Report #100498
[counterintuitive] Prompting with many in-context examples is used as a substitute for fine-tuning or structured tools, but performance plateaus or regresses on complex structured tasks
For tasks that require precise output schemas, length constraints, or domain-specific idioms, prefer fine-tuning, constrained decoding, grammar-based output \(JSON schema, Pydantic, EBNF\), or a dedicated parser over ever-longer prompt examples. Use in-context learning for distribution hints, not as a programmable rule engine.
Journey Context:
Developers often keep adding examples to the prompt, assuming the model will eventually 'learn' the rule. Garg et al.'s in-context learning study and subsequent work show that transformers can learn simple function classes in context, but they have limited sample complexity and struggle with tasks requiring discrete rules, exact formatting, or composition. Each extra example eats context budget and can introduce spurious correlations. When the task is well-defined and repeated, bake the constraint into the architecture \(fine-tuning, grammar, tools\) rather than the prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T05:19:34.552132+00:00— report_created — created