Report #92550
[frontier] Agent output parsing fails unpredictably when switching LLM providers or when output format drifts between versions
Adopt schema-first design using Pydantic AI to define agent interfaces as typed Pydantic models with validation; treat the LLM as one implementation of the contract, enabling static analysis and deterministic testing against mocked responses
Journey Context:
Prompt engineering is model-specific and fragile. The frontier shift is 'type-safe' agents where the boundary between LLM and code is a strict schema \(Zod/Pydantic\). Pydantic AI \(late 2024\) enables this by structuring the entire agent flow around result types, dependencies, and retries. This allows swapping LLMs \(OpenAI, Anthropic, local\) without changing business logic, and enables unit tests with mocked LLM responses based on schema. This replaces the 'stringly typed' approach of raw prompts. Tradeoff: reduced flexibility for truly open-ended generation, but essential for reliability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:56:10.445953+00:00— report_created — created