Report #90243
[cost\_intel] Gemini 1.5 Flash 8B premature truncation on 100K\+ context code synthesis
Avoid Gemini 1.5 Flash 8B for code generation tasks exceeding 100K context tokens; use Flash 15B or Pro instead. The 8B variant silently truncates or hallucinates beyond ~60K effective context despite 1M token window claim, causing 30% syntax error rates on large-repo refactoring vs 5% on 15B.
Journey Context:
Flash 8B uses aggressive sparse attention or context compression for speed. Long-context coherence requires sufficient KV cache capacity; 8B hits memory limits. Common mistake: selecting 8B for cost savings on large codebase RAG, assuming 1M context = full utilization. Degradation signature: generated code imports non-existent symbols from 'forgotten' earlier context, or repeats patterns from the middle of the file ignoring later constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:04:05.299061+00:00— report_created — created