When the agent is the software

AI agent represented as a digital entity

This is the fourth post in a series about agentic coding. The first was about two problems blocking autonomous AI coding assistants. The second about the asymptotic promise. The third about the freedom-risk curve.

I think I’ve been circling around something in those posts without naming it. When we talk about AI agents, there are actually two very different things going on.

Mode 1: The agent writes software. You ask it to implement a feature, write a script, build a component. It produces code. You run the code. The code is deterministic — same input, same output, every time. The problems I wrote about before still apply: you need to verify the code and understand what changed. But once the code works, it works.

Mode 2: The agent is the software. You don’t ask it to write code. You ask it to do something. Rename all files matching a pattern. Extract data from these 200 documents. Restructure this folder hierarchy. The agent doesn’t produce an artifact — it performs the operation directly.

Most people don’t think about which mode they’re in.

The mode confusion

Here’s what I keep seeing. Someone has a data-heavy task — restructure a bunch of files, extract information from a pile of documents, transform data across a complex directory tree. Instead of asking the agent to write a script that does this, they ask the agent to just do it.

It works. Mostly. And when it works, it feels like magic.

But they’ve just switched from mode 1 to mode 2 without noticing. And mode 2 has a problem that mode 1 doesn’t.

When the agent writes a script and the script works, you have a deterministic artifact. You can run it again. You can inspect it. You can test it on a small sample before running it on everything. You can read the code and understand the logic. If something goes wrong, you can debug it.

When the agent is the software, none of that is true. There’s no artifact. There’s no way to rerun the exact same operation. If something went wrong on file 147 out of 200, you might not even notice. And if you do notice, good luck figuring out what happened.

The non-determinism I wrote about in my first post? In mode 1, it’s bounded. The agent might write different code each time, but each version is itself deterministic. In mode 2, the non-determinism is the operation itself.

When it matters

For a quick one-off where the stakes are low, mode 2 is fine. Rename these five files. Sure, let the agent just do it.

But the moment the task involves more than you can verify by glancing at the result — more files, more complexity, more consequences — you should probably be in mode 1. Make the agent write the script. Then run the script.

It’s a small shift in how you prompt. Instead of “restructure these files,” you say “write me a script that restructures these files.” But the difference in reliability is enormous.

The middle ground

There’s an obvious counterargument: “just ask the agent to log what it does.” And it’s not wrong. Some agents already show you every step — Claude Code prints every file it touches and every command it runs.

But a log of what happened is not the same as a script you can run again. You can read the log and see that the agent renamed 200 files. You can verify the result after the fact. What you can’t do is test it on a small sample first. Or hand it to a colleague and say “run this on your data too.”

The interesting space is agents that do both — perform the operation and produce a replayable artifact. Do the thing, but also emit a script that does the same thing deterministically. That would give you the convenience of mode 2 with the safety of mode 1.

We’re not there yet. But if I had to bet on what closes the gap between mode 1 and mode 2, it’s this middle ground.

The future is mode 2

Here’s the thing that messes with my head. The future — not the near future, but eventually — is all mode 2.

At some point, AI gets good enough that we don’t need it to write an e-commerce website. We just ask it to sell our products. We don’t need it to build a dashboard. We just ask it questions about our data, and it answers.

In that world, there is no software. The agent is the software. The code disappears entirely.

And that world might actually solve the non-determinism problem, or at least reframe it. When you ask a human store clerk a question, you don’t expect a deterministic response. You expect a reasonable one. If the agent is the software — if it’s the interface itself — maybe non-determinism stops being a bug and starts being a feature. That’s how conversations work.

The tipping point

So we’re in this weird in-between. The technology isn’t good enough yet for mode 2 to be reliable for complex tasks and we still need to solve the problem of non-determinism.

The skill right now — the actual practical skill — is recognizing which mode you should be in. When should you let the agent be the software, and when should you make it write the software?

My rough heuristic: if you need the result to be repeatable, verifiable, or auditable, you want mode 1. Make it write the script. If it’s truly ephemeral and the blast radius is small, mode 2 is fine.

But I think about this tipping point a lot. Because it’s moving. What’s too complex for mode 2 today might be perfectly fine for mode 2 next month. And at some point — maybe — everything tips to mode 2. And then we stop writing software entirely.

The mode confusion#

When it matters#

The middle ground#

The future is mode 2#

The tipping point#

The mode confusion

When it matters

The middle ground

The future is mode 2

The tipping point