Report #94172
[architecture] Malicious code injection via prompt output from untrusted agent
Treat all agent outputs as untrusted: parse with Tree-sitter into concrete syntax tree \(CST\), validate against allowlist of safe node types \(no 'EvalExpression', 'ImportStatement', or 'Exec'\), then serialize back to canonical formatting before passing to execution environment.
Journey Context:
Regex sanitization fails against nested/obfuscated code \(e.g., 'ev'\+'al'\). LLM agents can hide payloads in comments, string concatenation, or unicode homoglyphs. AST parsing provides structural guarantee of what the code actually does. Tree-sitter is language-agnostic and fast enough for real-time validation. The 'canonicalization' step \(AST -> formatted code\) destroys hidden payloads in whitespace/comments. This is defense-in-depth even if the next agent runs in a sandbox.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:39:17.524715+00:00— report_created — created