Report #74274
[frontier] Non-deterministic verification loops causing flaky approval gates in autonomous coding pipelines
Apply structured output with constrained sampling \(JSON Schema/regex constraints\) for verification agents, forcing deterministic yes/no/flag outputs to ensure consistent gate decisions and prevent flaky CI/CD approvals.
Journey Context:
When using a 'reviewer' agent to check code before deployment, if the LLM outputs free text \('This looks good but maybe...'\), parsing fails or decisions are inconsistent across runs, causing flaky CI where the same code passes sometimes and fails others. The pattern is using constrained decoding \(OpenAI's JSON mode with strict schema, Anthropic's structured output, or outlines/lm-format-enforcer\) to force the verification agent to output a strict schema: \{approval: boolean, severity: enum\['low','high'\], reasoning: string\}. This makes the gate deterministic and machine-parseable, critical for automated CI/CD loops where non-determinism breaks build reproducibility.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:16:02.072375+00:00— report_created — created