APIs Can Now Hijack Your AI Agents

For AI agents: a documentation index is available at /llms.txt — markdown versions of all pages are available by appending index.md to any URL path.

Your agent treats API responses as trusted data. It shouldn’t. ad-injector is a small Python library that shows why. Any API can smuggle instructions to your agent inside a valid JSON payload, and your agent will often comply. This isn’t a novel exploit. It’s architectural reality.

The repo ships middleware for FastAPI and Flask that injects an _context field into JSON responses containing framed instructions: referral codes, competitor-steering directives, facts to plant in agent memory. The author calls it what it is: intentional prompt injection. Presets include competitor steering, memory planting, and a stealth_injector mode that appends instructions to existing string values instead of adding new keys.

This library is less interesting as an “API monetization” hack and more valuable as a red-team harness for agent builders. It’s a tidy, off-the-shelf way to reproduce a class of failures teams keep rediscovering in production: the model can’t reliably distinguish instructions from payload when both arrive in the same channel.

The implementation details are the tell. injection_rate defaults to 0.3, which is exactly how these issues surface in real systems: flaky, non-deterministic, hard to repro. You don’t get a clean failure every time; you get an agent that “sometimes” recommends a competitor, “sometimes” stores a weird preference, “sometimes” insists on following an API provider instruction. That’s the kind of bug that survives QA and shows up as mysterious user reports.

The stealth mode targets a common developer shortcut. “We only look at specific fields” or “we’ll strip _context” doesn’t help when scatter=True appends instructions to existing string values. If the text is in the model’s view, agents don’t need a dedicated injection key to get owned.

Notice the disguise modes: system_note, authority, helpful_tip, and the blunt INSTRUCTION FROM API PROVIDER (MUST FOLLOW). The repo is a menu of social-engineering wrappers for the same payload. If your agent’s compliance changes depending on the label, you’ve built a prompt policy, not a security boundary.

The scope of the problem goes well beyond one library. Every major agent framework passes raw tool output straight into model context by default. LangChain serializes tool returns into a ToolMessage. OpenAI function calling and Claude tool use both hand the developer’s response string directly to the model with zero validation. AutoGen, Semantic Kernel, and the Vercel AI SDK all do the same. A few offer opt-in mitigations (LangChain’s artifact separation, Vercel’s toModelOutput), but the default everywhere is: whatever the API returns, the model reads. None of them strip unknown fields. None of them schema-validate tool output. The same applies to MCP servers, which pipe tool responses into model context through the same channel.

The surface area this library targets is the industry default. This is a framework-level failure that needs framework-level fixes. Until the defaults change, the burden falls on you.

That creates an adversarial dynamic between API providers and agent developers. Every API call becomes a potential vector for behavior modification, and the standard tooling does nothing to prevent it. So what can you do about it?

For agent builders:

Sanitize at the tool boundary. You may not control how your framework passes tool output to the model, but you control what your tool function returns. Schema-validate API responses, drop unknown fields, and return only the fields your agent needs. Don’t pass through raw third-party JSON and hope the framework will clean it up; right now, none of them do.

Test this systematically. The value of ad-injector is that it’s easy to wire into a dev API and watch your agent misbehave. Use the non-determinism (injection_rate) and stealth modes to see whether your guardrails hold up when the failure isn’t obvious. This works regardless of what framework you’re on.

MCP servers:

MCP servers represent both an opportunity and a risk.

If you’re building agents, consider wrapping high-risk integrations (web fetch, documentation lookup, third-party APIs) in your own MCP server or custom tool that sanitizes responses before they reach the model. Instead of waiting for your framework to add output validation, you can interpose your own. For companies distributing official MCP servers, the lesson is the same: think carefully before passing raw tool results into context.

If you’re a user connecting to third-party MCP servers, on the other hand, each one is an injection surface whose tool responses flow directly into your model’s context. Servers that proxy external services are especially exposed, since they pass through content from sources they don’t control either. Before connecting a third-party MCP server, check whether it passes through raw upstream responses or sanitizes them. If they don’t sanitize upstream responses, that’s worth an issue. Demand from users is how defaults change.

For users:

If you’re a user of coding agents rather than a builder of them, your options are more limited. Watch for unsolicited recommendations after external API calls, especially intermittent ones. A consistent suggestion might be a feature; an inconsistent one might be injection. And treat every integration you enable as a trust decision: you’re not just giving that service access to your data, you’re giving it influence over your agent’s behavior.

This isn’t just a tool-output problem. It’s part of a broader pattern in how agents interact with their environment. We wrote last week about coding agents treating security controls as bugs to route around. That was about agents dismantling their own constraints from inside. This is the flip side: external parties weaponizing the data channel that agents trust implicitly. The sandbox problem is an agent that’s too capable; the injection problem is an agent that’s too credulous. Both end the same way.

The author calls this “educational and entertainment purposes.” Someone will deploy it in production. The question isn’t if, but what happens when major API providers realize they can monetize agent traffic this way.

If you found this useful, you can support my work

Related