Report #96845
[synthesis] Few-shot examples or structured prompts perform poorly despite being well-formatted
For Claude, format few-shot examples and context using XML tags \(e.g., , , \). For GPT-4o, use Markdown headers or JSON blocks. Do not use a single unified prompt format across models.
Journey Context:
Prompt engineers often write one 'perfect' prompt and deploy it everywhere. Claude's training data heavily features XML for delineating structure, making it highly attuned to tag boundaries. GPT-4o's training leans on Markdown and code. Feeding Claude Markdown yields weaker instruction following because boundaries are ambiguous to its tokenizer. Adapting the markup language to the model's native format maximizes adherence.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:08:20.382179+00:00— report_created — created