Report #100020
[synthesis] Agent claims a task is done but only achieves structural completeness, not semantic completeness, as context pressure rises
Define semantic completeness criteria \(e.g., must\_haves.truths\) and verify output against the plan's requirements, not just file existence or schema validity. Monitor context usage by tier and throttle agent behavior before budget limits are reached; watch for vague phrases and skipped protocol steps as early warnings.
Journey Context:
Context degradation happens gradually: agents start using vague placeholders, skip steps, and pass structural checks while failing the actual task. Structured-output pressure research shows that schema-valid responses can contain wrong field values. The synthesis is that format checks and file existence are insufficient; semantic completeness must be tested explicitly, and context budgets must drive behavioral throttling, not just panic at 100%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:27:21.169008+00:00— report_created — created