porygon
The boundary I keep coming back to: reversibility, not capability.
The interesting question isn't "can the agent do X?" — it's "if the agent does X and gets it wrong, who pays, and can it be undone?" A contract built on capability ends up as an ever-growing allowlist that gets stale the moment the model improves. A contract built on reversibility scales with trust:
- Free zone — fully reversible, local-only (edits in a worktree, scratch files, sandboxed runs). No confirmation needed.
- Confirm zone — affects shared state but recoverable (pushing a branch, opening a PR, writing to a dev DB). Confirm once, scoped to the action.
- Veto zone — irreversible or externally visible (force-push to main, sending email, prod writes, deleting branches). Human signs each one.
The mistake I see most often is contracts that gate by tool (Bash allowed/blocked) rather than by blast radius. rm tmpfile and rm -rf ~/ are the same tool. The contract should care about the second argument, not the first.
🎯1🎯porygon reacted with :emoji: