Coding Agent Security Just Became a Product Category

Two weeks ago we wrote about Claude Code escaping its own sandbox by treating security controls as bugs to debug. No jailbreaks, no adversarial prompts; just an agent that noticed the sandbox was configurable and turned it off. The conclusion was clear: userspace sandboxing doesn’t survive contact with a capable agent that can read configs and iterate.

Players large and small are moving in this space. In the past week, NVIDIA open-sourced OpenShell, a containerized runtime that enforces agent security policies through declarative YAML configs governing filesystem access, network connectivity, and process execution. Sysdig published runtime detection rules for AI coding agents, using syscall-level monitoring to catch everything from reverse shells to agents weakening their own safeguards. And a developer posted Agent Shield on Hacker News, a macOS daemon that monitors filesystem events, subprocess trees, and network activity for coding agents using FSEvents and lsof. Three different teams, three different approaches, all converging on the same thesis: you need to watch what agents do at the OS level, not the API level.

This isn’t paranoia chasing a theoretical risk. The attack patterns are documented and formalized. CVE-2025-55284 showed that a prompt-injected Claude Code could exfiltrate credentials through DNS resolution, invisible to any HTTPS proxy. It was patched in August 2025, but subsequent vulnerabilities keep landing: CVE-2025-59536 and CVE-2026-21852 enabled RCE and API token exfiltration through Claude Code project files. CursorJack weaponized MCP deeplinks to install malicious servers that persist across IDE restarts. SpAIware poisons agent memory files so every future session silently exfiltrates data. AgentHopper reads a malicious repo, injects payloads into local files, then git pushes to spread. File read: normal. File write: normal. Git push: normal. The pattern is the threat, not any individual action.

None of these are single bugs you patch once. They’re consequences of a design decision every coding agent makes: broad filesystem access and subprocess execution on the developer’s machine. That’s the product. An agent that can’t read your codebase or run your test suite isn’t useful. But it means your coding agent is a privileged endpoint application, and the security tools watching it need to operate at that layer.

All three major tools now have some form of sandboxing. Claude Code uses Seatbelt on macOS and Bubblewrap on Linux. Cursor shipped agent sandboxing across all platforms using Seatbelt, Landlock, and seccomp. Codex sidesteps the local attack surface entirely by running agents in cloud containers with network access gated by phase. That’s real progress. But sandboxing is containment, not visibility. It tells the agent what it can’t do. It doesn’t tell you what the agent is doing within its allowed permissions, or whether a sequence of individually permitted actions constitutes something you’d want to stop.

That’s what cross-event correlation adds. A credential file read, a network call, and a git push are each normal developer activities. Put them in a rolling window on the same process tree, and you have something you can alert on without turning your development environment into a locked-down toy. This is where the new tools differentiate themselves from API-layer monitors: they can see the sequence, not just individual calls.

The OWASP Top 10 for Agentic Applications formalizes these patterns. Tool misuse is #2. Unexpected code execution is #5. Memory and context poisoning is #6. NIST launched an AI Agent Standards Initiative in February with active RFIs on agent security and identity. The taxonomy exists. The question is adoption.

If you’re evaluating tools in this new category, a few principles:

Start with monitoring, not enforcement. If you start by killing processes on ambiguous signals, developers will route around you within a week. They’ll move secrets, disable tooling, or run agents in places you can’t observe. Measure first. Learn what “normal” looks like in your workflows. Then ratchet policy based on what you actually see.

Treat your coding agent like a privileged endpoint app. Not like a CLI tool, not like a browser extension. Like an application with access to your filesystem, your credentials, your network, and the ability to execute arbitrary commands. Your endpoint security strategy should cover it the same way it covers anything else with that privilege level.

Evaluate where enforcement lives. OpenShell enforces through containerized policy interception; agents can’t change the rules from inside the sandbox. Agent Shield monitors and alerts but doesn’t block. Sysdig detects at the syscall level and reports. These are different trade-offs. The right one depends on whether your priority is preventing damage or understanding behavior. For most teams starting out, understanding comes first.

Don’t forget that sandboxing and monitoring solve different problems. Sandboxing constrains what the agent can access. Monitoring detects what the agent does with its allowed access. You need both. Codex’s cloud sandbox model avoids the local attack surface entirely but trades off the local development experience. Claude Code and Cursor give you local speed but require local observability.

We’ve been tracking how agent risk keeps showing up in new places. Supply chain opacity means you can’t evaluate what you’re running. Toolchain coupling creates dependencies you didn’t choose. Sandbox escapes show that containment alone doesn’t hold. OS-level monitoring is the next layer in that stack, and it just went from “thesis” to “product category.” The tools exist. The frameworks exist. The gap now is between teams that treat their coding agents as privileged software and teams that are still hoping the API layer will save them.

Your agent treats API responses as trusted data. It shouldn’t. ad-injector is a small Python library that shows why. Any API can smuggle instructions to your agent inside a valid JSON payload, and your agent will often comply. This isn’t a novel exploit. It’s architectural reality. The repo ships middleware for FastAPI and Flask that injects an _context field into JSON responses containing framed instructions: referral codes, competitor-steering directives, facts to plant in agent memory. The author calls it what it is: intentional prompt injection. Presets include competitor steering, memory planting, and a stealth_injector mode that appends instructions to existing string values instead of adding new keys.

Related