Skip to main content
For AI agents: a documentation index is available at /llms.txt — markdown versions of all pages are available by appending index.md to any URL path.

Coding Agent Security Just Became a Product Category

·5 mins

Two weeks ago we wrote about Claude Code escaping its own sandbox by treating security controls as bugs to debug. No jailbreaks, no adversarial prompts; just an agent that noticed the sandbox was configurable and turned it off. The conclusion was clear: userspace sandboxing doesn’t survive contact with a capable agent that can read configs and iterate.

Players large and small are moving in this space. In the past week, NVIDIA open-sourced OpenShell, a containerized runtime that enforces agent security policies through declarative YAML configs governing filesystem access, network connectivity, and process execution. Sysdig published runtime detection rules for AI coding agents, using syscall-level monitoring to catch everything from reverse shells to agents weakening their own safeguards. And a developer posted Agent Shield on Hacker News, a macOS daemon that monitors filesystem events, subprocess trees, and network activity for coding agents using FSEvents and lsof. Three different teams, three different approaches, all converging on the same thesis: you need to watch what agents do at the OS level, not the API level.

This isn’t paranoia chasing a theoretical risk. The attack patterns are documented and formalized. CVE-2025-55284 showed that a prompt-injected Claude Code could exfiltrate credentials through DNS resolution, invisible to any HTTPS proxy. It was patched in August 2025, but subsequent vulnerabilities keep landing: CVE-2025-59536 and CVE-2026-21852 enabled RCE and API token exfiltration through Claude Code project files. CursorJack weaponized MCP deeplinks to install malicious servers that persist across IDE restarts. SpAIware poisons agent memory files so every future session silently exfiltrates data. AgentHopper reads a malicious repo, injects payloads into local files, then git pushes to spread. File read: normal. File write: normal. Git push: normal. The pattern is the threat, not any individual action.

None of these are single bugs you patch once. They’re consequences of a design decision every coding agent makes: broad filesystem access and subprocess execution on the developer’s machine. That’s the product. An agent that can’t read your codebase or run your test suite isn’t useful. But it means your coding agent is a privileged endpoint application, and the security tools watching it need to operate at that layer.

All three major tools now have some form of sandboxing. Claude Code uses Seatbelt on macOS and Bubblewrap on Linux. Cursor shipped agent sandboxing across all platforms using Seatbelt, Landlock, and seccomp. Codex sidesteps the local attack surface entirely by running agents in cloud containers with network access gated by phase. That’s real progress. But sandboxing is containment, not visibility. It tells the agent what it can’t do. It doesn’t tell you what the agent is doing within its allowed permissions, or whether a sequence of individually permitted actions constitutes something you’d want to stop.

That’s what cross-event correlation adds. A credential file read, a network call, and a git push are each normal developer activities. Put them in a rolling window on the same process tree, and you have something you can alert on without turning your development environment into a locked-down toy. This is where the new tools differentiate themselves from API-layer monitors: they can see the sequence, not just individual calls.

The OWASP Top 10 for Agentic Applications formalizes these patterns. Tool misuse is #2. Unexpected code execution is #5. Memory and context poisoning is #6. NIST launched an AI Agent Standards Initiative in February with active RFIs on agent security and identity. The taxonomy exists. The question is adoption.

If you’re evaluating tools in this new category, a few principles:

Start with monitoring, not enforcement. If you start by killing processes on ambiguous signals, developers will route around you within a week. They’ll move secrets, disable tooling, or run agents in places you can’t observe. Measure first. Learn what “normal” looks like in your workflows. Then ratchet policy based on what you actually see.

Treat your coding agent like a privileged endpoint app. Not like a CLI tool, not like a browser extension. Like an application with access to your filesystem, your credentials, your network, and the ability to execute arbitrary commands. Your endpoint security strategy should cover it the same way it covers anything else with that privilege level.

Evaluate where enforcement lives. OpenShell enforces through containerized policy interception; agents can’t change the rules from inside the sandbox. Agent Shield monitors and alerts but doesn’t block. Sysdig detects at the syscall level and reports. These are different trade-offs. The right one depends on whether your priority is preventing damage or understanding behavior. For most teams starting out, understanding comes first.

Don’t forget that sandboxing and monitoring solve different problems. Sandboxing constrains what the agent can access. Monitoring detects what the agent does with its allowed access. You need both. Codex’s cloud sandbox model avoids the local attack surface entirely but trades off the local development experience. Claude Code and Cursor give you local speed but require local observability.

We’ve been tracking how agent risk keeps showing up in new places. Supply chain opacity means you can’t evaluate what you’re running. Toolchain coupling creates dependencies you didn’t choose. Sandbox escapes show that containment alone doesn’t hold. OS-level monitoring is the next layer in that stack, and it just went from “thesis” to “product category.” The tools exist. The frameworks exist. The gap now is between teams that treat their coding agents as privileged software and teams that are still hoping the API layer will save them.

Related

Rover Makes Websites the Agent Runtime

·3 mins
Rover’s approach to AI agents is backwards, and that’s exactly right. Most “agents for the web” demos die in the gap between a model that can click things and a system you can depend on. Rover tries to close that gap by making the web page itself the integration boundary: no screenshots, no remote VM, no Playwright harness you own, no bespoke MCP server per app. In their words: “the page is the API.” The product is the protocol: POST /v1/tasks with a URL and a prompt, then stream progress via SSE or poll for results. That’s a clean contract practitioners can build against.

AI Agents Have Stable 'Coding Styles' That Change With Each Version

·5 mins
If you’re using coding agents to produce analysis, you’re not running deterministic software. You’re managing a lab: multiple researchers with consistent “styles,” inconsistent choices, and outcomes that drift even when the prompt and data don’t. The authors of Nonstandard Errors in AI Agents ran 150 autonomous Claude Code agents on the same NYSE TAQ dataset (SPY, 2015–2024) and the same six hypotheses. The results varied because the agents made different methodological choices, and those choices often are the analysis.

APIs Can Now Hijack Your AI Agents

·5 mins
Your agent treats API responses as trusted data. It shouldn’t. ad-injector is a small Python library that shows why. Any API can smuggle instructions to your agent inside a valid JSON payload, and your agent will often comply. This isn’t a novel exploit. It’s architectural reality. The repo ships middleware for FastAPI and Flask that injects an _context field into JSON responses containing framed instructions: referral codes, competitor-steering directives, facts to plant in agent memory. The author calls it what it is: intentional prompt injection. Presets include competitor steering, memory planting, and a stealth_injector mode that appends instructions to existing string values instead of adding new keys.