7 min read

Emilio Carrión

Discipline Doesn't Scale. Verification Needs Infrastructure.

Individual discipline as a quality system is a fragile design. Tests scaled because they became infrastructure. Verification needs to do the same.

aiengineeringverificationleadershiparchitecture

Last week, a non-technical person in my circle built a working tool using AI. Not a prototype or a demo, something that did what it was supposed to do, with passing tests and a clean interface.

I took a look and my first reaction was "this is surprisingly good." The second was "I have no idea if this should go to production." Because "it works" and "it's correct" are not the same thing. And the distance between the two is exactly where the problems live.

So I did something different from what I would have done a year ago. Instead of reading the code line by line looking for whatever "felt off" (which is how we've verified things our entire careers), I wrote a verification contract before reviewing anything. Three questions per dimension: does it do what it should? Is it well made? Does it fit in the real system?

What caught me off guard was what I would have missed without the contract. On the functional dimension everything was green, the AI had done a clean job. On craft, two design decisions were inconsistent with the project's conventions. Nothing serious, but the kind of thing that accumulates. And on contextual (the one that always worries me most) I discovered a dependency that under real load would behave differently from what the tests showed. Without the contract, I would have reviewed with my intuition, probably would have caught the craft issue, and almost certainly would have missed the contextual one.

Intuition doesn't scale. Discipline doesn't scale. It never has.

The pattern that keeps repeating

We know we should document decisions. We don't because the sprint closes on Friday. We know we should do pair programming. We don't because it's faster to do it alone. We know code reviews should verify concrete criteria. In practice, we look at the diff, see the tests pass, and hit LGTM.

Individual discipline as a quality system is a fragile design. It depends on someone remembering, having time, having context, not being tired. In a world where generation moved at human pace, it was enough. In a world where AI generates 10x faster, it doesn't cut it.

You know what has scaled? Tests. Not because developers are more disciplined than before, but because tests became infrastructure. CI/CD doesn't scale because people remember to deploy correctly, it scales because the process doesn't depend on anyone remembering. Linters don't scale through discipline, they scale because they're embedded in the flow.

Weekly Newsletter

Enjoying what you read?

Join other engineers who receive reflections on career, leadership, and technology every week.

This newsletter is written in Spanish.

In my previous article I argued that verification is the new core work of engineering and described three dimensions: functional (does it do what it should?), craft (is it well made?) and contextual (does it fit in the real system?). What I didn't say is how to scale that beyond individual willpower.

Now I believe the answer is the same as with tests: verification needs to become infrastructure.

What I built

So I built it. It's called Plumbline. It's verification infrastructure for AI-assisted work. Two phases: one before building, one after.

In the first phase you generate a verification contract. Plumbline analyzes your project (documentation, code, conventions) and produces a document with concrete criteria for what "done" means for that task. Not code. Criteria. Each one classified across the three dimensions and tagged as [auto] (executable by agents) or [manual] (requires human judgment).

A real contract looks like this:

Functional: "POST /auth/login returns 200 with valid credentials" [auto]. "Rate limiting activates after 5 attempts" [auto].

Craft: "Auth logic lives in the service layer, not in the route handler" [auto]. "Naming follows codebase conventions" [manual].

Contextual: "All existing tests still pass" [auto]. "Latency impact of the auth middleware on the hot path has been evaluated" [manual].

In the second phase, after building, Plumbline executes the automated checks, guides you through the manual ones, and produces a report with evidence for each criterion.

All state is Markdown. No database, no server, no configuration. If verification isn't transparent, it's not verification, it's another black box.

Why code review isn't enough

Code review, as we practice it, wasn't designed for the world that's coming.

Code review assumes the reviewer understands the code being reviewed. Assumes the reviewer has context about the system. Assumes the reviewer has time to review in depth. And assumes the code was written by a human mind with recognizable intentions.

None of those assumptions hold when AI generates the code. The reviewer doesn't always understand what the AI generated. System context lives in the heads of three seniors who aren't always available. Review time hasn't scaled while the volume of generated code has multiplied. And the "intentions" behind AI-generated code are statistical patterns, not design decisions.

I'm not saying code review dies. I'm saying it needs to evolve from "I review what I see" to "I verify against agreed criteria." From subjective opinion to verifiable contract. From a practice that depends on the reviewer's discipline to infrastructure that works even when the reviewer is tired, in a rush, or unfamiliar with that corner of the system.

That's what a verification contract does: it separates "what to verify" from "who verifies." You define the criteria when you have time and context. You execute the verification later, against those criteria, with discipline embedded in the process and not in the person.

What I didn't expect

Back to the story from the beginning. The contract made me a better verifier. I didn't expect that.

Without the contract, I would have reviewed with my intuition. I would have looked at what my experience tells me to look at. And I would have left unreviewed what my experience doesn't cover, which is more than I like to admit.

With the contract, I verified all three dimensions systematically. The functional one was fast and automated. Craft forced me to compare against conventions I've internalized but never written down. And contextual made me think about the complete system before approving a piece.

The invisible heuristics of seniors are valuable. But relying exclusively on them is the same as relying on discipline: it works until it doesn't. What we need is to convert the ones we can into explicit criteria, codify them as verifiable contracts, and reserve human judgment for the third dimension, the one that requires context no system can capture yet.

Craft needs infrastructure

I've spent five articles arguing that in a world where AI generates code, what matters is the judgment of whoever verifies it. That seniors' heuristics are the most valuable infrastructure a team has. That generation is commodity and verification is craft.

Plumbline is an attempt to turn part of that craft into infrastructure. Every criterion you pull from a senior's head and turn into a verifiable contract is one fewer single point of failure.

It doesn't solve everything. The third dimension still needs humans with context. The deepest heuristics remain tacit. But if your quality criteria live only in the heads of three seniors, you don't have a verification system. You have three single points of failure. And in a world where generation accelerates every quarter, those three points are the bottleneck that determines whether your team scales or drowns.

Generation is commodity. Verification is craft. And craft needs infrastructure.

Question for you: If you had to write a verification contract for your team's next feature, what would go in the contextual dimension? The thing only you know, that isn't in any test, and that no agent can discover.

Newsletter Content

This content was first sent to my newsletter

Every week I send exclusive reflections, resources, and deep analysis on software engineering, technical leadership, and career development. Don't miss the next one.

Join over 5,000 engineers who already receive exclusive content every week

Emilio Carrión
About the author

Emilio Carrión

Staff Engineer at Mercadona Tech. I help engineers think about product and build systems that scale. Obsessed with evolutionary architecture and high-performance teams.