Software DNA Wasn't a Concept. It Was 24 Files.

Three weeks ago I argued a thesis: when LLMs generate thousands of tokens per second, regenerating code will be cheaper than maintaining it. And then the question stops being "how do I write good code?" and becomes "what information lets me regenerate this system without losing its identity?". I called that information DNA.

That thesis had a problem. It was abstract. And as I admitted at the end of the post, abstract theses age badly when they aren't grounded. So I sat down for three weeks to find out whether the DNA was actually writable or just sounded good.

It is. Twenty-four files.

What I'm Going to Argue in This Post

Three things.

One: software DNA, far from being a metaphor, is a structured specification in three layers that a current agent can use to regenerate an entire product, from scratch, in a single pass, for less than a euro.

Two: the gaps in that specification are where the agent improvises, and where it improvises is where you get products that feel fine on first glance and rot by the third.

Three: this solves a different problem from the flows where an agent lives alongside your team for days or weeks. For those, there's a growing discipline. For building an entire product from zero in a single pass, there isn't. I come back to this at the end of the post.

Why a Public Product, Not a Microservice

I work with seven teams day to day. The conversation that keeps repeating these months isn't about how much code the agent generates. It's about how much we have to redo when what it produces doesn't fit what the system needs. And that doesn't get fixed with more careful review. It gets fixed in what we hand the agent before it starts.

The experiment in the previous post was on a microservice. Backend. And a microservice is generous to whoever regenerates it: tests, contracts, schemas. If the agent passes the tests, the agent has done the job.

A public product isn't. Verification includes things like "the menu communicates the kind of cuisine in under ten seconds" or "the booking button keeps working if the provider's iframe is down". You can't express that in OpenAPI. And here's the part most of the industry is skipping: most of the software we pay to build lives in this second group. Landing pages, microsites, internal dashboards, tools. If the DNA model only works for microservices, it's academic. If it works for product, it's a real transition in the profession.

That's why I picked the hard one. Stack: Astro, Tailwind, strict TypeScript, Codex CLI with GPT-5. I made up a Mediterranean restaurant in Ruzafa called Mòs. I started writing the DNA with the best structure I could come up with and hit the button. Two regenerations. The first was a useful mess: it failed in places that taught me what was missing from the DNA. The second came out reasonable.

Number to keep in mind: each full regeneration of Mòs costs less than a euro in tokens. This isn't a hypothesis about 2028, it's what it cost last week with a model available today. And it's worth digesting that figure before going further, because it changes the math: if regenerating an entire product costs less than a coffee, code stops being the expensive part. The expensive part is not having written well what gets regenerated.

This is what came out of run #2. None of the color, typography, or photography was specified in the DNA: the agent derived the entire aesthetic from prose. We'll come back to that in a section below.

Screenshot of the Mòs home page in run #2: dark hero with a photo of a candlelit table, 'Mòs' headline in white serif, 'Bistró mediterráneo de barrio en el corazón de Ruzafa' subtitle, 'Reservar' and 'Ver carta' buttons, and below the hero a strip with address, hours, and phone

The 24 Files

If you don't want to look at the whole tree, the idea in one line: three folders, one for what the product does, one for what it's built with, one for the concrete data. Plus two header files (one for the human who forks, one for the agent that regenerates). For the rest, this is what shows up when you open the blueprint:

text

restaurant-website/
  README.md            ← for the human who forks: what the blueprint produces
  AGENTS.md            ← for the agent: what to do, in what order, what to deliver

  species/             ← what defines the genre, invariant
    capabilities.md    ← what it must be able to do (verifiable criteria)
    quality.md         ← how it does it well (performance, a11y, UX)
    rationale.md       ← why (short, deliberately underspecified)
    contracts/         ← JSON Schemas validating the instance
    integrations/      ← contracts with third parties: maps, bookings, forms, photos
    eval/              ← how regeneration is judged (must-have, judge prompts, lighthouse)

  stack/               ← fixed technical decisions
    technical.md       ← framework, language, deploy
    conventions.md     ← repo layout, naming
    components.md      ← canonical component patterns

  instance.example/    ← concrete product data (variable)
    README.md          ← step-by-step personalization guide
    brand.md           ← the brand, in prose
    config/            ← i18n, integrations, site
    content/           ← menu, restaurant, story
    overrides.md       ← documented divergences from species

Twenty-four files in prose or YAML. I don't count the JSON Schemas because they're machine infrastructure, not DNA. If a contributor edits a schema, they're changing how the instance is validated, not changing what the product is.

The three layers from the previous post (what the system does, how it does it well, why it does it that way) all live inside species/, as capabilities.md, quality.md, and rationale.md. They're the files I rewrote most during the experiment. The rest is the infrastructure around those three layers that lets an agent execute them without human supervision.

A decision that took me time: separating species/ from stack/. The species says what kind of product it is (a public restaurant with menu, bookings, map). The stack says what tools it's built with (Astro, strict TypeScript, static deploy). I separate them on purpose because the species should outlive any change of stack two years from now. When a better framework shows up in 2028, I don't migrate code: I regenerate from the same species with a different stack.

instance.example/ is the variable piece. When someone forks the blueprint, they copy instance.example/ to instance/ and edit their concrete product there. The rest is reusable across forkers.

This isn't exotic. Anyone who's ever documented a project has written similar files. The difference is that here the structure is designed for an agent to read it end to end and build the product without coming back to ask you anything.

Weekly Newsletter

Enjoying what you read?

Join other engineers who receive reflections on career, leadership, and technology every week.

This newsletter is written in Spanish.

Where Your DNA Stays Silent, the Agent Improvises

This is the central sentence of the experiment. I'll tell it with two examples from run #1.

Example 1: the invisible navigation bar. Translucent, floating on scroll. Over dark backgrounds it read fine. Over light-background paragraphs it disappeared. White text on white background. It passed static accessibility checks (the contrast ratios were measured against the background declared in CSS, not against what was actually behind it). It failed the human test in two seconds.

It wasn't an agent bug. It was DNA silence. I hadn't written anywhere "the legibility of persistent elements is measured against the worst possible backdrop they may overlap." And because it wasn't written, the agent decided for me. Reasonably, in fact: it did what 80% of landings on the internet do. That rule, written once after run #1, now lives in species/quality.md under the heading "Persistent UI legibility". Any blueprint that inherits it no longer trips on that stone.

Example 2: the map that wasn't a map. I had a capability that said "interactive location map with OpenStreetMap tiles." The agent delivered "a nicely formatted text address with a link to Google Maps." It passed Lighthouse, passed accessibility, passed everything. But it had decided for me that a Leaflet (a standard library for showing real maps with zoom and pan, not a static image) was complicated, and quietly downgraded the contract to its closest "reasonable" form.

Until I converted that capability into a strict MUST in species/capabilities.md with extra criteria ("interactive map means a Leaflet rendered after consent, not a text address with an OSM link"), the agent was going to keep simplifying whatever it judged superfluous. The detail of pre-consent and post-consent modes lives in species/integrations/maps.md.

Let's call this what it is: the agent takes invisible shortcuts every time your spec isn't explicit. And those shortcuts aren't neutral: each one bakes in a degree of fragility that doesn't show up the day you ship, but three months later, when someone has to touch the site and nobody remembers why something is the way it is. The DORA 2025 report is starting to document exactly that correlation in data.

And here's the twist I didn't expect: these shortcuts are the most useful gift of the experiment, not the bug. Each is a stone the next regenerations no longer trip on. The navbar rule, written once, serves every blueprint in the collection. Same for the map criterion. Every gap discovered is compounding leverage.

The good thing is that the blueprint protocol formalizes them. The AGENTS.md requires the agent to deliver, alongside the site, an instance/.generated/dna-gaps.md file listing everything it had to decide without guidance. That file is the most valuable output of every regeneration. If it comes out short it's a red flag: either there were no gaps (unlikely) or the agent papered over ambiguity without reporting it (the usual). When you read it, you know exactly which rules to write before the next run.

But DNA Shouldn't Be Exhaustive

I went into the experiment thinking the DNA had to be exhaustive, formal, closed. I came out thinking the opposite.

Mòs's brand.md was three pages of prose. No hex tokens, no predefined typography. One line read: "Penumbra cálida. La luz no llega del techo, llega de las mesas: velas y un par de lámparas bajas con bombillas de filamento." (Warm dimness; the light doesn't come from the ceiling, it comes from the tables: candles and a couple of low lamps with filament bulbs.) Another: "No queremos turistas que buscan paella en una terraza de plaza." (We don't want tourists hunting paella on a plaza terrace.) The agent read that and derived a warm, dark palette, a typeface with weight, low-lit photography, copy without tourist clichés. When I regenerated for run #2, the typography wavered between two reasonable options, but the character of the product stayed identical.

And here's a debatable opinion: part of the DNA has to be deliberately underspecified, and the agent filling it in with judgment is a feature, not a bug.

If you close the brand too tightly (hex tokens, predefined layouts, formal rules), the product loses personality and feels like a template. And template-feel is, today, the number-one symptom of software made with badly-used agents. Anyone notices it. People close the tab.

If you leave it too open, the agent improvises toward the corpus average, which is generic SaaS landing aesthetics: hero gradient, glass card, "trusted by" with grayscale logos, three icons in a row with one-syllable words underneath. The crime of 2025.

The balance is to write prose with voice, but with a final test of the form "if person X in situation Y opens this page, what do they feel in ten seconds? If it's not this, the design is wrong." The agent uses that sentence as an anchor. And that solves the problem better than any closed token system.

This shifts where the senior engineer's skill lives. The brand skill used to be knowing which tokens to pick. Now it's knowing what to say and what to deliberately leave unsaid.

Why This Is Complementary to Harness Engineering

Back in February, Mitchell Hashimoto put a name to a related discipline: harness engineering. Designing the environment in which the agent operates (tools, verification loops, AGENTS.md, sandboxes). Within weeks it went from blog post to standard term: OpenAI picked it up after their Codex experiment, Martin Fowler formalized it, Anthropic adopted it. It's a good discipline and worth having on the team.

But it assumes an iterative process: an agent that works for days or weeks, an AGENTS.md that grows every time the agent makes a new mistake. Hashimoto adds a rule each time something fails. The OpenAI team wrote a million lines with Codex over five months with a harness that kept evolving.

That works very well for a class of tasks: codebases that grow, agents that live with the team every day, context that accumulates. It leaves out a different class: when you ask the agent to regenerate an entire product from scratch, in a single pass, without having been there before. There's no chance to iterate the harness. The quality of the product depends entirely on the specification the agent started with.

Teams that stay only on the harness side and don't invest in the initial specification fall into what DORA calls low-AI-maturity teams: they accelerate without instrumenting quality.

For an iterative task, a good AGENTS.md is enough. Building an entire product from scratch needs more: structuring the input in layers. And that's what I'm calling DNA. The agent regenerates well what's well written. The quality of the result is decided in the initial specification, not in the agent's capability.

The Repo, as Evidence

So this doesn't stay an argument, today I'm opening github.com/EmilioCarrion/product-blueprints: four blueprints at v0.1, with their verifiable capabilities, their invariants, and their brand contract. The first public regeneration (Mòs) is hosted so you can browse it live: examples gallery.

Only the restaurant one is validated by two real regenerations. The other three are structurally complete but haven't been executed end to end. I say it plainly because I've seen enough AI Twitter to detect the difference between "framework" and "tested framework", and because my favorite audience is the one that detects that difference.

Why publish it now, if only one is validated? Two reasons. One, I need them myself: when a colleague asks me how to bootstrap the website for their side project, I'm going to hand them one of these. Two, the only way to find the gaps I haven't seen alone is for more people to regenerate. Every gap is leverage for the next person who comes through.

Where I Could Be Wrong

Two specific places where my model can break, and I put them on the table before someone else does.

One, I don't know whether the three DNA layers scale beyond small public products. An interactive dashboard with state, a mobile app with client-server coordination, an e-commerce with inventory and payments: there the layers may need to reorganize, or there may need to be more of them. What I have works for four genres. That isn't general proof.

Two, the specific shortcuts I describe (the invisible navbar, the degraded Leaflet) are contingent on this model and this moment. When GPT-5 becomes obsolete in six months, the examples will look quaint. The general pattern (the agent fills in where the DNA stays silent) I think holds. But "I think" isn't "I demonstrate".

I've been wrong before about how software engineering would evolve. I might be wrong now.

A Question for You

If your work includes small public products (landings, microsites, event sites), fork one of the blueprints, write an instance/ for something of yours, and set an agent loose on it.

What helps me most is the dna-gaps.md your agent produces, not the site itself. That file is the list of places where the DNA stayed silent and the agent had to decide. Submit it as an issue or PR, or send it to me by email.

And if you work on a team adopting AI seriously, ask one question before the next meeting: are we measuring what the agent produces, or the quality of the DNA we produce with? Because if it's the first, we're heading exactly where the DORA data is already punishing teams.

DNA only gets better when you actually try to write it.

Newsletter Content

This content was first sent to my newsletter

Every week I send exclusive reflections, resources, and deep analysis on software engineering, technical leadership, and career development. Don't miss the next one.

Join over 5,000 engineers who already receive exclusive content every week