Report #53575
[synthesis] Agent pipeline breaks because model adds conversational text alongside tool calls
For Gemini, explicitly add 'DO NOT add conversational text before or after tool calls' to the system prompt. GPT-4o and Claude natively separate text and tool calls in the API, but Gemini 1.5 Pro often mixes them in the parts array.
Journey Context:
OpenAI and Anthropic APIs strictly separate content \(text\) and tool\_calls in the assistant message. A parser expecting a pure tool call object will work flawlessly. However, Google's Gemini API returns an array of parts, which can contain both a text part \('I will use the tool now'\) and a function call part in the same turn. If an agent framework assumes a single modality per turn \(like LangChain often does\), it crashes or drops the tool call on Gemini. The synthesis is that multi-provider agents must handle the Gemini parts array by filtering for function\_call types and ignoring text parts, rather than assuming mutual exclusivity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:25:29.033622+00:00— report_created — created