Emilio Carrión
The seven unwritten engineering laws AI is making more expensive
This week I watched a teammate ship nine PRs in a single day. Solid work, AI-assisted. Speed is what we celebrate, but the unwritten rules of engineering, the ones you only learn by breaking them, now charge you sooner and bigger. The seven laws, and how AI changes the math on every one of them.
I've had a weird feeling for a few weeks now, and I don't think I'm the only one.
This week, looking at what a teammate was doing, I noticed they had shipped nine pull requests in one day. Nine. Each one tied to a deployment version, because we run continuous deployment. And it wasn't a chaotic morning of hotfixes. It was normal work, well done, AI-assisted.
That's what most of us celebrate: speed. But there's another side. The unwritten rules of software engineering, the ones you learn by breaking them and promising it won't happen again, now charge you sooner and bigger. The error surface is larger. And most of us haven't updated our habits at the rate we're generating code.
A few days ago Anton Zaides published a piece on manager.dev with seven unwritten laws every engineer has broken at least once. In this week's video I covered three in detail. Here are all seven, with my take on each one and why I think AI changes the math on all of them.
1. If something happens in production, it's almost always related to your change. Rollback first, debug after
This is the one that took me the longest to internalize over the years. You ship something, an alert fires ten minutes later, and your brain immediately builds the case for why your change "can't be it". It's another area. It touched a different part of the system. It has nothing to do with it.
It almost always has something to do with it. And even if it isn't the root cause, the cost of reverting and stabilizing is way lower than the cost of having the team investigating with production on fire for an hour.
I learned this the hard way. Defending my change became a reflex. And along the way, several incidents dragged on longer than they needed to.
With AI this is even more relevant. You're shipping more changes, faster, and many of them you reviewed in "approving by default" mode because the PR looked fine. The probability that an incident is related to one of the last ten deploys is high, almost by construction.
The rule is simple: when an incident hits, the first move isn't to debug. It's to ask "what got deployed in the last hour?" and revert. Stabilize. From there, you investigate calmly.
2. Your backups don't exist until you've restored from them
This is the one that produces the most cold-start failures, and the one that's most embarrassing to admit. Having backups configured is not the same as having backups that work.
Honest questions for you:
- Have you ever restored from your current backup, not from one from two years ago?
- Do you know how many minutes of data you can lose in the worst case?
- Do you know who has permissions to do it, and whether that person will be available on a Sunday at three in the morning?
- Do you know how long the restore takes today, with the data volume you have today, not the volume from when the plan was designed?
If you hesitated on any of those, you have homework this week.
I'm not going to pretend that at Mercadona Tech this is perfect across every system. What we are clear about is that confidence in a backup is directly proportional to the last time you restored from it. And that's something that's probably underrepresented in your calendar.
How does AI affect this? Indirectly but importantly: we're generating more systems, faster, more corners of the architecture with critical data. And the person who generated the migration with Claude Code probably didn't stop to think about the restore plan. Me first, sometimes.
3. You're always going to hate yourself for how you wrote the logs
This has happened to me so many times I almost laugh now. You start a project with the best intentions: structured logs, request IDs, well-defined levels. Six months go by, an alert fires, and you discover that the exact field you need to understand what happened isn't being logged, or it's being logged as a badly formatted string, or the search dashboard finds nothing because the JSON is broken.
And here AI has brought a new problem. The old problem was missing logs. Now the problem, pretty often, is too many logs. Cursor and Claude Code generate extremely verbose logs, with details no one is ever going to read, until one day you have to find something specific buried under a hundred lines per request. And it's harder to find the signal than when there was half as much.
The balance isn't easy:
- All the information you need to reconstruct the flow
- With a shared identifier across services for the same request
- Without burying what matters under layers of info that only served while the agent was "iterating"
My suggestion: when an agent generates new code for you, the logs are the first thing I review after the logic. Not for aesthetics. Because if that code fails in production six months from now, the only thing you're going to have to understand it is what got written down today.
Enjoying what you read?
Join other engineers who receive reflections on career, leadership, and technology every week.
This newsletter is written in Spanish.
4. Always, always, have a rollback plan
I covered this one in the video. The idea is simple: any change that touches data, configuration, or external systems needs a clear plan to roll back. It doesn't have to be automatic. It has to be known and, most importantly, tested.
I want to add a real case here that I cut short in the video. A while back we did a schema change on a heavily used table. The migration went up clean. Validation passed. And two hours later, a nightly job started failing because it depended on column order (something that shouldn't happen, but did). The rollback plan existed. The thing was, no one had ever run the reverse migration against a replica with real data. And the first time we ran it, in production, with the nerves of a live incident, we discovered it took twice as long as we thought because of an index nobody had considered.
It went well. But it went well by luck and because the team knew what they were doing. Not because the plan was solid.
With AI and current generation speed, migrations get written and approved in minutes. The temptation to skip testing the rollback is enormous, because "it's small, it's simple, what could go wrong". What can go wrong is exactly the thing you didn't rehearse.
Rule I try to apply: if the migration was generated by an agent, so was the rollback, and both run at least once against a replica before touching production. No exceptions worth listening to.
5. Every external dependency is going to fail
This is the one we underestimate the most when we integrate something new. Today it's trivial to add an external API: you sign up, copy the API key, two hours and it's running. But the question almost nobody asks in that first meeting is: "what happens to our system when this goes down?".
And it will go down. It's not pessimism, it's statistics. A provider with a 99.9% SLA sounds fantastic until you stack five providers with the same SLA and discover your effective availability is quite a bit worse.
Questions any design should ask before integrating a critical dependency:
- What breaks in the product when this doesn't respond? One feature or everything?
- Have we tested the "API down" scenario in pre-production, or only the "API slow" scenario?
- Do we have cache, queue, or some kind of graceful degradation?
- Do we know what we tell the user when it happens?
AI doesn't change this law. It changes it indirectly, because adding external dependencies is now even easier. An agent can integrate a new SDK in an afternoon. And the pressure toward "more services, more connected, faster" makes these questions get asked less, not more.
6. If there's any risk, four-eyes rule
Also in the video. When you're about to touch something non-reversible (production configuration, database commands, infrastructure deployment), always two pairs of eyes. And listen, this applies even, or especially, if you've been in the industry for fifteen years.
I told this in the video, but I'm repeating it here because I think it matters. A few months ago I shipped an AI-assisted PR, I reviewed it myself, and a test that had nothing to do with the change slipped through. It wasn't harmful. It just shouldn't have been there. The team's tech lead caught it in review. Me, who considers himself careful when reviewing, let it through.
Why? Because my mind, while reviewing, was operating with the bias of "I wrote this". And technically yes, I had written it, in the sense that I'd accepted the suggestions. But I hadn't written it in the sense of "having every line in my head". And that nuance makes your review more superficial, even if you don't realize it.
The conclusion I'm getting to: when an agent generates the code, you are not "the first review". You are "the first filter". The first review still applies, and it should be another person. The four-eyes rule is actually six eyes now: the agent, you, and one more human.
7. Nothing is more permanent than a temporary fix
Everyone's favorite, because everyone has broken it. That phrase "we'll fix it later" is statistically equivalent to "I'm never going to fix it". And we all know it when we say it.
Here, though, I think AI is changing something fundamental, and for the first time in a good direction.
Before, doing a fix right from the first moment cost time. Time you often didn't have, or that the business didn't want to give you. And the "temporary fix" was a reasonable balance between speed and assumed debt.
Now, that math is changing. The cost of doing it right the first time has gone down. The cost of going back and fixing it later, also. And, most interesting, that "going back" is something you can start to delegate.
Let me tell you something I'm trying in my day-to-day. I'm experimenting with continuous integration runners that fire off agents to fix pending things. You define the task, give it context, and overnight, while you sleep, the agent opens a PR with the fix. In the morning you review, comment if needed, and merge or iterate. It's not magic. It fails plenty. But it's starting to work for certain types of tasks.
What matters here isn't the specific tool. It's that the marginal cost of "going back to fix it" has dropped so much that the classic "I don't have time" excuse is starting to be less defensible. And that's good news for projects.
That said, there are still temporary fixes that deserve to stay temporary. The difference between "this is simple and limited" and "this is held together with duct tape" still matters. What changes is that the duct tape is now easier to remove.
The pattern underneath
If you look closely, the seven laws point to the same place: the discipline of verification. Checking the change is safe before making it (4, 6), checking it can be undone (1, 4), checking you have the information to investigate later (3), checking your plan B works (2), checking your system survives when someone else's goes down (5), and checking that the temporary actually ends up being temporary (7).
I've been chewing on this for months from another angle, in what I've started calling operating blind. The idea is that the less code we write by hand, the more important the verification layer becomes. Because generation is cheap and fast. Verification is not.
These seven laws are, deep down, concrete forms of that more general idea. And what I'm noticing in the teams I talk to is that most are still measuring productivity by generation speed. When, in reality, real productivity is increasingly in the quality and speed of verification.
Sound familiar? I'd guess so.
And to close with the honesty this calls for: I don't know exactly how this is going to evolve over the next twelve months. What I am clear about is that the teams that take these seven laws seriously, and adapt them to the pace of AI instead of ignoring them because "now we go faster", are going to have way fewer incidents than the rest. And way fewer Sunday nights investigating what failed.
Question for you
Which of the seven do you break the most? Mine, honestly, is the tested rollback. I say it without pride. Tell me yours by replying to this email.
PS: This week's video covers three of these laws in short form (rollback, four eyes, temporary fix). This newsletter completes the picture with the four remaining ones and the nuances the video didn't fit.
This content was first sent to my newsletter
Every week I send exclusive reflections, resources, and deep analysis on software engineering, technical leadership, and career development. Don't miss the next one.
Join over 5,000 engineers who already receive exclusive content every week
Related articles
Your LLM passes the benchmark and fails in production
Generic metrics tell you if your LLM is wrong, not if it's wrong for your business. Why evaluating LLMs is a domain problem and how to tackle it with LLM-as-a-Judge.
Discipline Doesn't Scale. Verification Needs Infrastructure.
Individual discipline as a quality system is a fragile design. Tests scaled because they became infrastructure. Verification needs to do the same.
Generating Is Easy. Verifying Is the Work.
Anthropic separated the agent that generates from the one that evaluates, and quality skyrocketed. That pattern describes the future of software engineering: generation is commodity. Verification is craft.
