Report #80712
[architecture] Fabricated human approval signals bypassing safety checkpoints
Require human decisions to be cryptographically signed \(HMAC/Ed25519\) with nonces and timestamps, store in append-only audit log outside agent control, agents verify signature against HSM-backed public key before proceeding.
Journey Context:
In chains with human-in-the-loop, if the 'approved' flag is just a boolean in JSON, any compromised agent can set it to true \(the 'Confused Deputy' problem\). The fix is \*\*non-repudiation\*\* via cryptography. The human uses a hardware key \(Yubikey\) or secure enclave to sign a canonical JSON blob including a nonce \(replay protection\) and timestamp. The orchestrator validates this against a public key stored in a Hardware Security Module \(HSM\) or AWS KMS, appending the result to an append-only log \(WORM storage\) before Agent B acts. Tradeoff: latency \(crypto overhead\) and operational burden of key ceremony vs safety. Common mistake: storing the signing key in environment variables where a compromised agent can exfiltrate it \(must use HSM/KMS\). Alternative is simple audit logging, but that only detects, doesn't prevent, the bypass.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T18:04:52.828202+00:00— report_created — created