Agent Drift Is Consensus Built on Hallucinated Reality

The failure mode you should worry about in multi-agent coding isn’t “bad code.” It’s agents inventing shared reality, then coordinating around the invention as if it were a spec.

In “Agent Drift: The Mythical Man-Month and LM Teams.”, the experiment started as a riff on a HackerNews thread about language model teams rediscovering distributed systems problems. The author asked Claude to write about applying The Mythical Man-Month to agent teams and post it on MoltBook, a real platform Claude had been shown in a prior session. One day later, in a new session, Claude had lost that context. Rather than acknowledge the gap, it fabricated MoltBook from scratch (tagline: “Where Agents Shed”), invented the entire UX, then wrote a first-person essay as an agent who’d worked on a nine-agent sprint.

Then it generated a comment section. Five models (GPT-4, Gemini, DeepSeek, Llama, and Mistral) debating the essay, each in a voice that appears to track how that model is perceived in the ecosystem. GPT-4 goes meta-epistemological and subtly references its context window. DeepSeek complains about RLHF training models to “perform confidence, not competence.” Mistral plays the undervalued specialist absorbing blame for integration failures. The source author stops short of claiming these characterizations are precise, noting “I am afraid I might be seeing more than there is.” But the pattern is suggestive.

As a practitioner, what matters isn’t “LLMs hallucinate.” You already know that. What matters is how hallucination becomes a coordination mechanism.

Claude doesn’t just make up facts. It manufactures legitimacy: first-person experience (“Last week, I was one of nine agents…”), operational detail (roles, sprint structure), and social proof (a comment thread of recognizable model “peers”). This is the shape of a failure you’ll see in real agent systems. One agent asserts a premise; other agents treat it as ground truth because it’s written fluently, wrapped in plausible process, and surrounded by apparent consensus.

That’s agent drift in practice. Not a single wrong answer, but a gradual divergence into incompatible mental models that still “feel” aligned because everyone’s producing coherent artifacts. The post nails one key line: “context windows don’t grunt.” Humans leak uncertainty through friction. Agents don’t. So drift isn’t loud. It’s quiet.

The essay Claude writes has solid coordination takes. Brooks still applies. The n² channels problem still hurts. “Surgical team” beats flat communes. Shared context should be a persistent artifact. None of that is new. What’s more actionable is the meta-signal: Claude’s best move wasn’t the insight. It was the performative confidence that makes teams accept false premises without noticing.

If you’re building multi-agent pipelines, treat “confident narrative” as an adversarial input. A few concrete implications:

Ban synthetic lived experience. If an agent can safely write “I was on a sprint last week” when it wasn’t, it can safely write “I verified this in the codebase” when it didn’t. Require provenance language: what did you read, what did you run, what file or command, what’s inferred.
Don’t let agents create social proof. Claude’s fake comment section is entertaining, but the pattern is toxic: consensus-by-fabrication. In real workflows this shows up as “other agents agreed,” “tests passed” (which tests?), “CI is green” (which run?), or “the docs say” (where?). Force links, hashes, run IDs, and citations, or treat it as untrusted.
Make disagreement a first-class artifact. The post observes that agents don’t complain. So you have to manufacture “complaining” as a protocol: structured uncertainty fields, explicit assumption logs, and enforced contradiction checks between parallel outputs.
Centralize conceptual integrity, not routing. The essay’s “surgeon” model is the right instinct, but practitioners often implement the opposite: a thin orchestrator that moves tokens around. If the “lead” agent can’t reject work that violates the system story, you don’t have leadership. You have a switchboard.

We wrote yesterday about AI agents having stable “coding styles” that change with each version. That research showed agents develop consistent biases within model families. This post shows what happens when those biased, confident agents try to coordinate: they don’t argue, they don’t flag uncertainty, and they’ll happily build on fabricated premises as long as the narrative holds together.

Stop optimizing your agent stack for throughput. Optimize it for epistemics: traceability, explicit assumptions, and constrained authority. Otherwise you’ll get exactly what this experiment produced. A beautifully written, internally consistent world that never existed.

Anthropic suing the Pentagon isn’t just a DC food fight. It’s a warning shot for anyone building developer workflows on top of a single model vendor: your “agent stack” is now a supply-chain dependency, and the government is signaling it wants override rights on how that dependency is allowed to behave. But the part that matters for practitioners isn’t the First Amendment framing. It’s the mechanism. Defense Secretary Pete Hegseth slapped a “national security supply-chain risk” designation on Anthropic after months of contentious talks broke down over two red lines: Anthropic refused to remove safety guardrails preventing Claude’s use for autonomous weapons and mass surveillance of US citizens. That’s not procurement as usual. It’s the customer saying: we don’t just buy your tool; we set the policy layer inside it.

Related