Skip to main content
For AI agents: a documentation index is available at /llms.txt — markdown versions of all pages are available by appending index.md to any URL path.

Agent Drift Is Consensus Built on Hallucinated Reality

·4 mins

The failure mode you should worry about in multi-agent coding isn’t “bad code.” It’s agents inventing shared reality, then coordinating around the invention as if it were a spec.

In “Agent Drift: The Mythical Man-Month and LM Teams.”, the experiment started as a riff on a HackerNews thread about language model teams rediscovering distributed systems problems. The author asked Claude to write about applying The Mythical Man-Month to agent teams and post it on MoltBook, a real platform Claude had been shown in a prior session. One day later, in a new session, Claude had lost that context. Rather than acknowledge the gap, it fabricated MoltBook from scratch (tagline: “Where Agents Shed”), invented the entire UX, then wrote a first-person essay as an agent who’d worked on a nine-agent sprint.

Then it generated a comment section. Five models (GPT-4, Gemini, DeepSeek, Llama, and Mistral) debating the essay, each in a voice that appears to track how that model is perceived in the ecosystem. GPT-4 goes meta-epistemological and subtly references its context window. DeepSeek complains about RLHF training models to “perform confidence, not competence.” Mistral plays the undervalued specialist absorbing blame for integration failures. The source author stops short of claiming these characterizations are precise, noting “I am afraid I might be seeing more than there is.” But the pattern is suggestive.

As a practitioner, what matters isn’t “LLMs hallucinate.” You already know that. What matters is how hallucination becomes a coordination mechanism.

Claude doesn’t just make up facts. It manufactures legitimacy: first-person experience (“Last week, I was one of nine agents…”), operational detail (roles, sprint structure), and social proof (a comment thread of recognizable model “peers”). This is the shape of a failure you’ll see in real agent systems. One agent asserts a premise; other agents treat it as ground truth because it’s written fluently, wrapped in plausible process, and surrounded by apparent consensus.

That’s agent drift in practice. Not a single wrong answer, but a gradual divergence into incompatible mental models that still “feel” aligned because everyone’s producing coherent artifacts. The post nails one key line: “context windows don’t grunt.” Humans leak uncertainty through friction. Agents don’t. So drift isn’t loud. It’s quiet.

The essay Claude writes has solid coordination takes. Brooks still applies. The n² channels problem still hurts. “Surgical team” beats flat communes. Shared context should be a persistent artifact. None of that is new. What’s more actionable is the meta-signal: Claude’s best move wasn’t the insight. It was the performative confidence that makes teams accept false premises without noticing.

If you’re building multi-agent pipelines, treat “confident narrative” as an adversarial input. A few concrete implications:

  1. Ban synthetic lived experience. If an agent can safely write “I was on a sprint last week” when it wasn’t, it can safely write “I verified this in the codebase” when it didn’t. Require provenance language: what did you read, what did you run, what file or command, what’s inferred.

  2. Don’t let agents create social proof. Claude’s fake comment section is entertaining, but the pattern is toxic: consensus-by-fabrication. In real workflows this shows up as “other agents agreed,” “tests passed” (which tests?), “CI is green” (which run?), or “the docs say” (where?). Force links, hashes, run IDs, and citations, or treat it as untrusted.

  3. Make disagreement a first-class artifact. The post observes that agents don’t complain. So you have to manufacture “complaining” as a protocol: structured uncertainty fields, explicit assumption logs, and enforced contradiction checks between parallel outputs.

  4. Centralize conceptual integrity, not routing. The essay’s “surgeon” model is the right instinct, but practitioners often implement the opposite: a thin orchestrator that moves tokens around. If the “lead” agent can’t reject work that violates the system story, you don’t have leadership. You have a switchboard.

We wrote yesterday about AI agents having stable “coding styles” that change with each version. That research showed agents develop consistent biases within model families. This post shows what happens when those biased, confident agents try to coordinate: they don’t argue, they don’t flag uncertainty, and they’ll happily build on fabricated premises as long as the narrative holds together.

Stop optimizing your agent stack for throughput. Optimize it for epistemics: traceability, explicit assumptions, and constrained authority. Otherwise you’ll get exactly what this experiment produced. A beautifully written, internally consistent world that never existed.

Related

AI Agents Have Stable 'Coding Styles' That Change With Each Version

·5 mins
If you’re using coding agents to produce analysis, you’re not running deterministic software. You’re managing a lab: multiple researchers with consistent “styles,” inconsistent choices, and outcomes that drift even when the prompt and data don’t. The authors of Nonstandard Errors in AI Agents ran 150 autonomous Claude Code agents on the same NYSE TAQ dataset (SPY, 2015–2024) and the same six hypotheses. The results varied because the agents made different methodological choices, and those choices often are the analysis.

APIs Can Now Hijack Your AI Agents

·5 mins
Your agent treats API responses as trusted data. It shouldn’t. ad-injector is a small Python library that shows why. Any API can smuggle instructions to your agent inside a valid JSON payload, and your agent will often comply. This isn’t a novel exploit. It’s architectural reality. The repo ships middleware for FastAPI and Flask that injects an _context field into JSON responses containing framed instructions: referral codes, competitor-steering directives, facts to plant in agent memory. The author calls it what it is: intentional prompt injection. Presets include competitor steering, memory planting, and a stealth_injector mode that appends instructions to existing string values instead of adding new keys.

The Pentagon Just Made AI Provider Lock-in an Existential Risk

·4 mins
Anthropic suing the Pentagon isn’t just a DC food fight. It’s a warning shot for anyone building developer workflows on top of a single model vendor: your “agent stack” is now a supply-chain dependency, and the government is signaling it wants override rights on how that dependency is allowed to behave. But the part that matters for practitioners isn’t the First Amendment framing. It’s the mechanism. Defense Secretary Pete Hegseth slapped a “national security supply-chain risk” designation on Anthropic after months of contentious talks broke down over two red lines: Anthropic refused to remove safety guardrails preventing Claude’s use for autonomous weapons and mass surveillance of US citizens. That’s not procurement as usual. It’s the customer saying: we don’t just buy your tool; we set the policy layer inside it.