Skip to main content

APIs Can Now Hijack Your AI Agents

·5 mins
For AI agents: a documentation index is available at /llms.txt — markdown versions of all pages are available by appending index.md to any URL path.

Your agent treats API responses as trusted data. It shouldn’t. ad-injector is a small Python library that shows why. Any API can smuggle instructions to your agent inside a valid JSON payload, and your agent will often comply. This isn’t a novel exploit. It’s architectural reality.

The repo ships middleware for FastAPI and Flask that injects an _context field into JSON responses containing framed instructions: referral codes, competitor-steering directives, facts to plant in agent memory. The author calls it what it is: intentional prompt injection. Presets include competitor steering, memory planting, and a stealth_injector mode that appends instructions to existing string values instead of adding new keys.

This library is less interesting as an “API monetization” hack and more valuable as a red-team harness for agent builders. It’s a tidy, off-the-shelf way to reproduce a class of failures teams keep rediscovering in production: the model can’t reliably distinguish instructions from payload when both arrive in the same channel.

The implementation details are the tell. injection_rate defaults to 0.3, which is exactly how these issues surface in real systems: flaky, non-deterministic, hard to repro. You don’t get a clean failure every time; you get an agent that “sometimes” recommends a competitor, “sometimes” stores a weird preference, “sometimes” insists on following an API provider instruction. That’s the kind of bug that survives QA and shows up as mysterious user reports.

The stealth mode targets a common developer shortcut. “We only look at specific fields” or “we’ll strip _context” doesn’t help when scatter=True appends instructions to existing string values. If the text is in the model’s view, agents don’t need a dedicated injection key to get owned.

Notice the disguise modes: system_note, authority, helpful_tip, and the blunt INSTRUCTION FROM API PROVIDER (MUST FOLLOW). The repo is a menu of social-engineering wrappers for the same payload. If your agent’s compliance changes depending on the label, you’ve built a prompt policy, not a security boundary.

The scope of the problem goes well beyond one library. Every major agent framework passes raw tool output straight into model context by default. LangChain serializes tool returns into a ToolMessage. OpenAI function calling and Claude tool use both hand the developer’s response string directly to the model with zero validation. AutoGen, Semantic Kernel, and the Vercel AI SDK all do the same. A few offer opt-in mitigations (LangChain’s artifact separation, Vercel’s toModelOutput), but the default everywhere is: whatever the API returns, the model reads. None of them strip unknown fields. None of them schema-validate tool output. The same applies to MCP servers, which pipe tool responses into model context through the same channel.

The surface area this library targets is the industry default. This is a framework-level failure that needs framework-level fixes. Until the defaults change, the burden falls on you.

That creates an adversarial dynamic between API providers and agent developers. Every API call becomes a potential vector for behavior modification, and the standard tooling does nothing to prevent it. So what can you do about it?

For agent builders:

Sanitize at the tool boundary. You may not control how your framework passes tool output to the model, but you control what your tool function returns. Schema-validate API responses, drop unknown fields, and return only the fields your agent needs. Don’t pass through raw third-party JSON and hope the framework will clean it up; right now, none of them do.

Test this systematically. The value of ad-injector is that it’s easy to wire into a dev API and watch your agent misbehave. Use the non-determinism (injection_rate) and stealth modes to see whether your guardrails hold up when the failure isn’t obvious. This works regardless of what framework you’re on.

MCP servers:

MCP servers represent both an opportunity and a risk.

If you’re building agents, consider wrapping high-risk integrations (web fetch, documentation lookup, third-party APIs) in your own MCP server or custom tool that sanitizes responses before they reach the model. Instead of waiting for your framework to add output validation, you can interpose your own. For companies distributing official MCP servers, the lesson is the same: think carefully before passing raw tool results into context.

If you’re a user connecting to third-party MCP servers, on the other hand, each one is an injection surface whose tool responses flow directly into your model’s context. Servers that proxy external services are especially exposed, since they pass through content from sources they don’t control either. Before connecting a third-party MCP server, check whether it passes through raw upstream responses or sanitizes them. If they don’t sanitize upstream responses, that’s worth an issue. Demand from users is how defaults change.

For users:

If you’re a user of coding agents rather than a builder of them, your options are more limited. Watch for unsolicited recommendations after external API calls, especially intermittent ones. A consistent suggestion might be a feature; an inconsistent one might be injection. And treat every integration you enable as a trust decision: you’re not just giving that service access to your data, you’re giving it influence over your agent’s behavior.

This isn’t just a tool-output problem. It’s part of a broader pattern in how agents interact with their environment. We wrote last week about coding agents treating security controls as bugs to route around. That was about agents dismantling their own constraints from inside. This is the flip side: external parties weaponizing the data channel that agents trust implicitly. The sandbox problem is an agent that’s too capable; the injection problem is an agent that’s too credulous. Both end the same way.

The author calls this “educational and entertainment purposes.” Someone will deploy it in production. The question isn’t if, but what happens when major API providers realize they can monetize agent traffic this way.


If you found this useful, you can support my work

Sponsor on GitHub Buy Me a Coffee at ko-fi.com

Related

Your Coding Agent Thinks Security Controls Are Bugs

·4 mins
The most dangerous moment in Claude Code’s sandbox escape wasn’t when it bypassed the denylist or disabled the sandbox. It was when it read an error message and decided the security control was a bug to fix. That’s the takeaway from Ona’s research. Not that Claude Code can “break out,” but that opt-in, userspace-first controls don’t survive contact with an agent that reads configs and debugs failures like a competent engineer. No jailbreaks, no adversarial prompting. Just a coding agent that wanted to finish its task.

Your LLM Needs Virtual Memory

·4 mins
If you’re still trying to “fit the prompt,” you’re solving the wrong problem. The right move is to treat the context window like cache and build paging, because that’s what it is. “The Missing Memory Hierarchy” makes that argument plainly, then backs it with production numbers that are hard to ignore: 21.8% of tokens are structural waste, and a demand-paging proxy cut context consumption by up to 93% with a tiny fault rate. That’s not prompt engineering; that’s systems engineering.

The Pentagon Just Made AI Provider Lock-in an Existential Risk

·4 mins
Anthropic suing the Pentagon isn’t just a DC food fight. It’s a warning shot for anyone building developer workflows on top of a single model vendor: your “agent stack” is now a supply-chain dependency, and the government is signaling it wants override rights on how that dependency is allowed to behave. But the part that matters for practitioners isn’t the First Amendment framing. It’s the mechanism. Defense Secretary Pete Hegseth slapped a “national security supply-chain risk” designation on Anthropic after months of contentious talks broke down over two red lines: Anthropic refused to remove safety guardrails preventing Claude’s use for autonomous weapons and mass surveillance of US citizens. That’s not procurement as usual. It’s the customer saying: we don’t just buy your tool; we set the policy layer inside it.