The pitch goes like this: point an agent at the codebase, wait a weekend, ship a rewrite Monday. I have watched smart teams believe this and then spend the next quarter explaining outages.
What actually changed
AI agents changed the cost of writing code. That is real. A task that used to be a Friday is now a coffee. Boilerplate is free. Translation between languages is close to free. The tax on typing has effectively been refunded.
So if the bottleneck in modernizing a legacy app was the typing, we should be done. Reader, the bottleneck was not the typing.
What legacy systems are actually made of
Legacy systems are not hard because the syntax is old or the code is long. They are hard because business behavior is implicit. The system encodes years of decisions that nobody wrote down:
- The discount logic that exists because of a 2017 lawsuit.
- The retry loop that exists because a vendor’s API used to flap on Tuesdays.
- The nullable column that one downstream report quietly depends on.
- The cron job that no one owns but everyone needs.
None of that is in the code in a form an agent can see. It lives in tickets, in Slack threads, in the heads of three people, one of whom left. The code is the artifact; the behavior is the contract. Rewriting the artifact without preserving the contract is how you ship a working program that breaks the business.
The new failure mode
Cheap code generation does not remove this risk. It accelerates it.
Give an agent a legacy module and ask for a “modern” version and you will get one. It will compile. It will pass the tests that exist. It will also quietly drop the seventeen behaviors that were never tested because everyone assumed they were obvious. You won’t notice on Tuesday. You’ll notice at month-end close.
Faster code generation creates faster architectural damage. The blast radius of a confident wrong decision used to be limited by how fast a human could type it. That limiter is gone.
Where AI agents genuinely help
I am not anti-agent. I use them every day. They are excellent at a specific shape of work — bounded, mechanical, verifiable:
- Test scaffolding. Generating characterization tests against existing behavior. Pinning the current truth so you can refactor underneath it.
- Documentation. Reading code and producing a first draft of what it does. Imperfect, but a 70% draft beats a blank page.
- Repetitive refactors. Renames, signature changes, framework upgrades across hundreds of files. The kind of work that is error-prone for humans precisely because it is boring.
- Dependency discovery. “Find every caller of this function across the monorepo, including dynamic dispatch.” Faster and more thorough than grep.
- First-pass migration code. Translating a service from Python to Go, or Express to Fastify. Treat the output as a starting point, not a destination.
Notice the pattern: in each case, the correctness criterion is local and checkable. You can look at the diff and know whether it’s right.
Where humans still matter
Other work is not like that. The criterion for “right” is global, contested, or only visible in production:
- System boundaries. What is a service, what is a library, what crosses a network. Agents don’t have an opinion on this and shouldn’t.
- Data ownership. Which service owns which table. Who is allowed to write to it. What happens when two services disagree. These are organizational decisions wearing a technical hat.
- Risk sequencing. What order to change things in so that you can roll back at every step. This requires understanding which failures are recoverable and which end up on the news.
- Production safety. Feature flags, dual writes, shadow reads, gradual rollouts. The choreography of changing a running system without stopping it.
- Business tradeoffs. “We could be strictly correct here, but it would break the report Finance has been reconciling against for six years.” Pick.
None of these are typing problems. They are judgment problems. They require sitting with the system long enough to know what it actually does, and sitting with the business long enough to know what it actually needs.
A workflow that works
Here is the loop I run on legacy modernization, with agents in it and not driving it:
- Map the system. Before any code changes, build a real picture: services, data stores, jobs, queues, external dependencies, and the flows between them. Agents help by tracing call graphs. Humans decide what the boxes mean.
- Extract the rules. Walk the code and the stakeholders. What does this thing actually do, and why? Write it down. Half the value of modernization is just this step — most teams have never written it down.
- Add characterization tests. Pin current behavior, not desired behavior. If the system returns a weird value in a weird case, your test asserts the weird value. You are not fixing it yet; you are protecting it. Agents are great at generating these.
- Define boundaries. Decide which seams you will cut along. Where will the new system meet the old? Who owns what data on which side? This is the architecture decision and it is on you.
- Constrain the AI tasks. Once you have boundaries and characterization tests, agents can safely work inside a box. The box is small, the contract is explicit, the tests catch regressions. This is where you get the speedup.
- Review against the architecture decisions, not just the diff. “Does the code work” is the easy question. “Does this change respect the boundaries we drew” is the one that matters. An agent will happily generate a clever shortcut that violates the very seam you were trying to protect.
The agent gets the typing. You keep the judgment. The system gets modernized without becoming a different system by accident.
The thesis
Cheap code did not eliminate the hard parts of software. It revealed that the hard parts were never the code. Legacy modernization is, and has always been, an exercise in making implicit knowledge explicit, drawing boundaries you can defend, and changing a running system without breaking the business it serves.
Agents are a power tool. Power tools do not replace knowing where the load-bearing walls are. They just let you cut through them faster.
I help teams modernize legacy systems with AI-assisted development guardrails that keep the agents productive and the production safe. fitzpatricksoftware.com
I help teams modernize legacy systems with AI-assisted development guardrails that keep the agents productive and the production safe. fitzpatricksoftware.com
