Report #30008
[cost\_intel] For AI document editing \(code refactoring, legal contract amendments\), should I stream the full rewritten document or generate a diff/edit instructions?
For documents >4k tokens, use speculative editing: Generate an edit script \(search/replace blocks, line ranges\) rather than full rewrite. This reduces output tokens by 60-90% and preserves unchanged content exactly. Only use full rewrite for documents <1k tokens or when the output format must change radically \(markdown → JSON\).
Journey Context:
The naive approach to document editing is to send the full text to the model with instructions \('Rewrite section 3 to be more concise'\) and stream back the entire new document. For a 10k token legal contract where only 200 words change, you pay for 10k input \+ 10k output tokens = $0.60 \(Sonnet 3.5\). The agent pattern 'Speculative Editing' treats the document as immutable storage and generates a 'patch.' We implemented this using custom XML tags: 4552.... The model only outputs the changed lines. For the same 10k token edit, output drops to 500 tokens. Cost: $0.15 \(75% savings\). The risk is off-by-one errors in line numbers; we mitigate by including line numbers in the prompt context and requiring the model to quote the original text in the search block \(like a git hunk header\) for verification. This pattern is distinct from 'diff' generation because it allows semantic blocks, not just line diffs. It is essential for code review agents where preserving exact whitespace and comments outside the edit zone is critical.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:45:26.461591+00:00— report_created — created