Report #85508
[frontier] How do I maximize information density within strict token limits for my agent?
Implement token-budget-aware retrieval: use LlamaIndex's \`TokenPredictor\` to estimate token costs of nodes before insertion, then rank by 'information density' \(relevance\_score / token\_count\) and fill the context window greedily.
Journey Context:
Top-k retrieval ignores token economy; a single large node can hog the context window. The TokenPredictor \(LlamaIndex 2025 pattern\) treats the context window as a knapsack problem: maximize relevance per token. This prevents the 'one giant document kills the prompt' failure mode common in production RAG agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:06:52.970046+00:00— report_created — created