You shouldn’t read the “chess engine in pure TeX” stunt as a party trick. You should read it as a warning shot. Coding agents are now good enough at systems thinking under hostile constraints that your bottleneck is shifting from “can the agent write code” to “can you give it guardrails, tests, and observability before it invents a tiny virtual machine inside your build.”
Mathieu Acher’s write-up tells the whole story.
The compelling part isn’t that Claude Code produced ~2,100 lines of TeX macros that play ~1280 Elo chess. It’s that in a language with no arrays, no return values, no stack frames, and painful recursion limits, the agent converged on a coherent architecture: a register machine built on top of TeX’s macro expansion.
When you drop an agent into an environment missing basic affordances, it doesn’t just “work harder.” It changes the substrate. In TeXCCChess, \count registers become RAM (board in 200–263, stack in 10000+), \csname tables become ROM, token lists become buffers, and macros like \pushstate/\popstate become an instruction set. pdflatex becomes the CPU. This is what strong engineers do in constrained runtimes, and the agent did it without an up-front architecture doc.
If your reaction is “sure, but that’s TeX,” you’re missing the more general point: agents will synthesize bespoke infrastructure whenever the local environment makes the direct approach awkward. Sometimes that’s a huge win (you get capability you didn’t have). Sometimes it’s technical debt that hides in plain sight because it’s “just glue code.”
The best evidence is in the failure modes. TeX’s \numexpr division rounds instead of truncating, which caused a nasty bug. The “fix” wasn’t a small patch; the agent precomputed file/rank lookup tables at load time using \divide and stored them in \csname expansions. That’s a real engineering move: shift cost from runtime to initialization; replace tricky arithmetic with table lookups; standardize access patterns.
Same story with state management. Chess engines live or die on make/unmake correctness. TeX has grouping, but the engine is global-state heavy, so the agent built a manual state stack in high-numbered count registers with a strict calling convention: \pushstate after \makemove but before \updategamestate; inverse ordering on unwind. The bugs were subtle and episodic (castling rights, en passant corruption every ~50 games). That’s not “lol LLM mistake.” That’s the class of bug you get when you create a custom stack discipline in any language.
What does this mean if you build with agents?
First: assume the agent may invent an internal VM, a protocol shim, or a data model translation layer if the environment is inconvenient. Demand an explicit architecture sketch early, even if you didn’t provide one. Not for documentation purity, but so you can spot when it’s building a second system.
Second: prioritize invariant tests over feature checklists. TeXCCChess worked in part because the agent could iterate and debug across multiple sessions, but the scariest issues are state corruption bugs that only show up intermittently. In your world that’s “cache invalidation,” “idempotency,” “retry safety,” “permission leakage.” Put those invariants into tests and make the agent run them constantly.
Third: watch for “precompute tables to avoid runtime complexity” moves. They’re often correct, but they can also hard-code assumptions and make later changes expensive. Make the agent justify the table, its generation, and its coverage.
Stop evaluating coding agents by the weirdness of the demo language. Evaluate them by whether their emergent architecture is legible, testable, and constrained. TeXCCChess is impressive; it’s also a reminder that agents don’t just write code. They design the machine they wish they had, right inside yours.