Skip to main content
For AI agents: a documentation index is available at /llms.txt — markdown versions of all pages are available by appending index.md to any URL path.

Your Coding Agent Has a Supply Chain Problem

·3 mins

The problem isn’t that Cursor built on Kimi. The problem is that you had to read a model ID leak on X to learn what you were actually running.

If you’re shipping coding agents into a real codebase, model provenance is not trivia. It’s a dependency. And dependencies need changelogs, constraints, and clear ownership.

Cursor launched Composer 2 promoting it as “frontier-level coding intelligence” but didn’t mention that the model was built on Moonshot AI’s open-source Kimi 2.5. An X user noticed identifiers pointing to Kimi in the code. Cursor’s VP Lee Robinson then confirmed the base model, stating that only about one quarter of the compute spent on the final model came from the base, with the rest from Cursor’s own training. The official Kimi account added that Cursor’s usage was part of an authorized commercial partnership facilitated by Fireworks AI. Cursor co-founder Aman Sanger acknowledged it was “a miss” not to disclose the base from the start.

That admission matters for practitioners: you can’t evaluate reliability, risk, or fit if you don’t know what the system is.

“Built on top of” is doing a lot of work here. Cursor is asking teams to accept two claims at once:

  1. The base doesn’t matter much (only a quarter of the compute; benchmarks are “very different”).
  2. The base matters enough to start there (otherwise, why do it?).

Both can be true. But if they’re true, you disclose the base model anyway, because the base affects the shape of failures: multilingual behavior, refusal patterns, memorization risk, and the long tail of weirdness that shows up only after a tool is embedded in CI and developer workflow.

TechCrunch flags the geopolitics of a U.S. company building on a Chinese model. But the practitioner concern is more immediate: enterprise review and supply-chain policy. If your internal approval process includes vendor questionnaires, data-handling requirements, or country-of-origin scrutiny for critical components, “we’ll fix that for the next model” is not an answer you can pass to procurement.

Cursor says its Kimi usage aligns with the license and notes the commercial partnership via Fireworks AI. Licensing compliance is table stakes. The issue is operational transparency: when the foundation is undisclosed, you can’t map where policy decisions need to happen. Is your security team evaluating Cursor, Fireworks, Moonshot, or all three? What changes if the base model changes? What if a future release swaps the foundation entirely and you only learn about it from another X post?

Think of it as the model equivalent of a software bill of materials. Teams should start demanding a basic standard from coding-agent vendors:

  • Name the base model at launch. Don’t bury it in a retroactive tweet.
  • Publish a model lineage note per major release: base, training deltas, and what materially changed.
  • Document what’s contractually stable, including model availability, region, and deprecation timelines.
  • Provide a consistent evaluation mode so benchmark comparisons across releases mean something.

Cursor says they’ll “fix that for the next model.” They should fix it for this one too.

We’ve been tracking how the agent supply chain keeps surprising teams in new ways. Two weeks ago, the Pentagon’s blacklisting of Claude showed that government action can vaporize model access overnight. Last week, OpenAI’s acquisition of Astral showed how toolchain coupling creates lock-in through soft mechanisms rather than licensing changes. Today’s episode is a third facet: you can’t manage what you can’t see. Political risk and toolchain coupling are hard enough when you know what you’re running. When the foundation is undisclosed, you’re not even playing the right game.

Coding agents are becoming production infrastructure. Provenance is part of the reliability story. If a vendor won’t tell you what it’s built on, you’re not buying intelligence. You’re buying surprise.

Related

OpenAI buying Astral is fine. Making uv a dependency of your agent stack isn't.

·4 mins
The acquisition isn’t the problem. The problem is quietly reorganizing your workflow until uv becomes an implicit dependency of your coding agent, and then discovering you can’t swap it out without pain. OpenAI announced this week that it will acquire Astral, bringing uv, Ruff, and ty into the Codex team. Astral’s tools have grown to hundreds of millions of downloads per month. They’re not a nice-to-have; they’re key infrastructure for modern Python development. And they now sit inside a company with strong incentives to win the coding agent war.

Skills aren't a cheat code for coding agents. They're configuration drift waiting to happen.

·7 mins
If you’re betting on “agent skills” to level up your coding agent, you’re mostly buying ceremony, and sometimes negative ROI. SWE-Skills-Bench tested 49 popular skills against 565 real GitHub tasks and found that skill injection is a narrow intervention: usually inert, occasionally useful, and sometimes actively harmful. Independent research on a much larger dataset tells us why, and the answer isn’t what you’d expect. The headline result is blunt. Across those 565 requirement-driven tasks (real repos pinned to commits, acceptance criteria enforced by tests), 39 of 49 skills produced zero pass-rate improvement. The average gain across all skills was +1.2%. That’s not “skills are the future.” That’s skills as a rounding error.

An AI Agent Built a JavaScript Engine. But the pudding is missing the proof.

·4 mins
The interesting part of JSSE isn’t that an agent “wrote a JavaScript engine.” The interesting part is what that achievement does and doesn’t prove about trusting agent-generated code. The author set a concrete, externally-audited target (test262), wired up a reproducible harness, and let the agent grind until the numbers moved. The engine comparison benchmark shows 101,044 of 101,234 scenarios passing (99.81%), with a separate progress tracker claiming 99.96% across runs. That’s an impressive foundation, but it’s only the first layer of a trust problem that gets harder from here.