Vibe Coding Is Not the Problem

Source asciidoc: `docs/article/2026-03-29—vibe-coding-is-not-the-problem.adoc` The current debate around AI-generated software is still too binary.

One side says AI should only be used for tiny, isolated tasks: a helper for boilerplate, a faster autocomplete, a utility for refactoring, nothing more.

The other side says the opposite: that modern coding models and agents have already made traditional software engineering optional; that a product can now be assembled by prompt accumulation, and that the remaining gap is merely better prompting.

Both positions are too shallow.

The real problem is not whether AI can write code. It can. The real problem is whether that code is produced inside a disciplined engineering system or inside a fog of shifting prompts, mixed abstractions, blurry product assumptions, and undefined business logic.

AI does not remove chaos. It accelerates it. But it can also accelerate order.

That is why the useful boundary is not AI coding versus no AI coding. The real boundary is this: AI-assisted engineering inside a canonized system versus AI generation without canon, boundaries, and delivery discipline.

The false binary around vibe coding

Much of the criticism of vibe coding comes from real pain. Teams have seen AI-generated projects collapse into duplicated logic, overlapping service layers, unused endpoints, dead abstractions, contradictory patterns, and rising change costs. In that sense, the critics are not imagining the failure mode; they are describing it accurately.

But many of them draw the wrong conclusion.

They see entropy generated by uncontrolled AI use and conclude that AI is suitable only for microscopic help. That conclusion is too narrow because it confuses two different things:

AI working inside a prepared engineering system.
AI being asked to create the engineering system while also writing the code.

Those are not the same activity.

The first can work remarkably well. The second usually produces statistically plausible fragments rather than a coherent product.

Martin Fowler captured this distinction well when he contrasted the original notion of vibe coding—coding while paying no attention to the code at all—with a more disciplined, professional mode of agentic engineering, where software engineers use coding agents to amplify existing expertise rather than replace it. See Martin Fowler, “Recent Changes” and Exploring Generative AI.

What the market is actually saying now

The most useful recent publications are no longer asking the naive question, “Can AI write code?” That is settled. The sharper question is now: Under what conditions does AI-generated code remain reliable, maintainable, and economically sane over time?

The most consistent answer across serious sources is not “prompt harder.” It is: prepare the environment better.

DORA’s 2025 report describes AI’s primary role in software delivery as an amplifier. It magnifies both the strengths and the weaknesses of the organization using it. The report emphasizes that the best returns come not from the tool alone, but from improving the surrounding system: platforms, workflows, testing, review loops, and operational foundations. See DORA, State of AI-assisted Software Development 2025 and Balancing AI tensions.

GitHub’s own guidance around Copilot moved in the same direction throughout 2025. Its recommendations for custom instructions explicitly emphasize project overview, tech stack, coding guidelines, project structure, and relevant resources. The message is simple: the tool performs better when the repository provides explicit context instead of expecting the model to infer it from probability. See GitHub, “5 tips for writing better custom instructions for Copilot”.

That line became even clearer in GitHub’s article on agents.md, based on analysis of more than 2,500 repositories. The lesson was not that the model magically understands the codebase. The lesson was that better outcomes come from clearly declaring role, workflow, stack, commands, constraints, and expected output. See GitHub, “How to write a great agents.md”.

Thoughtworks’ 2025 publications reinforced the same shift. Instead of treating AI coding as a pure prompting exercise, they increasingly framed it as context engineering, curated shared instructions, and disciplined human stewardship over code quality. See Thoughtworks, “From vibe coding to context engineering”, Thoughtworks, “AI assistance is a misunderstood revolution in software engineering”, and Thoughtworks, “In the age of AI coding, code quality still matters”.

The market signal is becoming hard to ignore: the center of gravity is moving away from ad hoc prompting and toward explicit engineering scaffolding.

Productivity is real, but it is not universal

This is another place where low-quality discussion has distorted the topic.

It is true that AI can produce measurable productivity gains in some settings. Microsoft Research’s well-known study on GitHub Copilot found that developers in the treatment group completed a controlled programming task 55.8% faster than the control group. See Microsoft Research, “The Impact of AI on Developer Productivity”.

That result matters, but it does not settle the question for all environments.

METR’s 2025 study of experienced open-source developers working on issues in their own repositories found something more complicated: the use of frontier AI tools, in that setup, was associated with developers taking 19% longer on average, despite their own expectations that AI would make them faster. See METR, “Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity”. In a later 2026 update, METR reported that subsequent data showed more mixed results, including subsets where the slowdown estimate weakened or reversed, but the broader lesson remained: the effect of AI is highly dependent on task shape, environment, and operator discipline. See METR, “We are Changing our Developer Productivity Experiment”.

This is exactly why serious teams should stop speaking about AI coding as if it had one universal outcome. Different contexts produce different results. Bounded tasks in a controlled environment can benefit significantly. Deeply contextual work in mature systems can behave very differently.

The right takeaway is not “AI always speeds engineers up” and not “AI always slows them down.”

The right takeaway is that AI amplifies what already exists:

clear architecture or blurred architecture,
disciplined delivery or chaotic delivery,
healthy review loops or weak review loops,
explicit rules or silent assumptions.

Why experienced engineers become more valuable, not less

The most expensive misunderstanding in the current hype cycle is the idea that technical experience is becoming optional.

In reality, AI shifts the value of engineering upward.

The experienced developer is no longer valuable only because they can handcraft a class or manually write a controller faster. They are valuable because they can determine the operating envelope in which generated code is allowed to exist safely.

Someone still has to decide:

which stack is canonical,
which framework conventions are mandatory,
which patterns are allowed,
which patterns are prohibited,
where business logic belongs,
which layer owns orchestration,
what counts as infrastructure,
how validation works,
how deliveries are sliced,
how outputs are reviewed,
how drift is detected,
and how the system remains coherent after fifty iterations instead of five.

Without that operator, AI has no reason to maintain a single engineering line. It will suggest statistically plausible fragments drawn from different schools of thought.

In one area, it may generate code that feels conventionally MVC.

In another, it may drift toward a hexagonal vocabulary because it has seen enough examples of that approach in similar technical discussions.

In a third place, it may overproduce abstract object-oriented structures because abstraction is statistically common in training data.

Elsewhere, it may reach for a direct, procedural, or functional shortcut that conflicts with the surrounding design.

Every fragment may look locally plausible. But local plausibility is not the same thing as system coherence.

This is why technical experience is not being erased. It is being re-leveraged. The critical skill is increasingly not just writing lines of code, but constraining what kinds of code are allowed to appear in the system at all.

Thoughtworks makes this point directly: in the age of AI coding, code quality still matters, and strong human stewardship remains central. See Thoughtworks, “In the age of AI coding, code quality still matters”.

The difference between valid-looking code and a valid engineering system

This is the mistake that sits underneath a great deal of current AI hype.

A model can produce code that compiles.

It can produce code that runs.

It can produce code that demos beautifully.

None of those things prove that the product is a sound engineering system.

A sound engineering system must survive change. It must remain understandable after multiple delivery waves. It must accommodate evolving business rules, new integrations, new team members, refactoring, incidents, scaling pressures, and partial rewrites.

That requires more than code generation. It requires sustained coherence.

The future cost of change matters at least as much as the initial speed of implementation.

This is one reason DORA’s work is so useful in this conversation. It repeatedly emphasizes that AI performance cannot be evaluated in isolation from the surrounding software delivery system. When AI increases throughput without sufficiently strong testing, review, and platform support, instability can rise alongside productivity. See DORA, State of AI-assisted Software Development 2025.

The economic implication is straightforward: ugly code can sometimes remain cheap enough to tolerate if its logic is still predictable. Incoherent code becomes expensive much faster because diagnosis, onboarding, change analysis, and regression control all degrade simultaneously.

Canon first, generation second

This is why the order of operations matters so much.

In mature AI-assisted engineering, generation is not the first step. It is a later step.

First comes canonization.

At minimum, that includes four dimensions.

Stack canon

The repository must make the stack explicit. Not just the language, but the ecosystem decisions that matter for consistency.

What framework is canonical? What testing model is canonical? What is the accepted dependency pattern? What is the serialization strategy? Where do configuration boundaries sit? What is the quality gate story?

Without these answers, the model samples solutions from adjacent but incompatible worlds.

Architectural canon

The project must define allowed structure.

Where does business logic live? What is orchestration and what is not? Which dependencies are forbidden? What counts as a layer violation? What directory trees are canonical? Which naming conventions are binding?

Without architectural canon, the model will keep proposing locally plausible structures that collectively erode the shape of the system.

Business-logic canon

This is the most underestimated part.

AI can generate syntactically sound code for completely wrong business behavior. If roles, permissions, statuses, transitions, commission rules, exception paths, and source-of-truth semantics are not formalized, the resulting code can look polished while encoding false product behavior.

That is often a more expensive failure than ugly code.

Delivery canon

The team must define how work is broken down and how output is accepted.

This includes the size of task buckets, the expected artifacts, the validation path, the review path, and the rollback path.

Without delivery discipline, AI coding becomes narrative improvisation: a sequence of plausible expansions that feel productive until the project becomes too expensive to understand.

Small buckets beat heroic prompts

One of the easiest ways to degrade AI-generated output is to ask for too much at once.

The fantasy version of vibe coding imagines that a whole product can be coherently assembled through increasingly ambitious prompts. But products are not built from enthusiasm alone. They are built from bounded changes that can be reasoned about, validated, and integrated.

That is why small buckets matter.

Good buckets include:

one bug fix,
one refactoring wave,
one validator or mapper,
one endpoint,
one vertical use case,
one infrastructure concern,
one review pass,
one cleanup pass,
one delivery package.

These buckets are not small because smallness is inherently virtuous. They are small because bounded work keeps validation tractable and keeps architectural drift visible.

The larger and more mixed the prompt, the more likely the model is to blur priorities, merge incompatible assumptions, and hide defects behind surface fluency.

In practice, many failures blamed on “AI quality” are really failures of decomposition.

AI coding is increasingly a management problem

As coding agents become more capable, the operator’s role starts to look less like “person who prompts” and more like “person who manages a bounded engineering process.”

Addy Osmani has argued this point clearly: as AI coding scales, it stops being merely a prompting problem and becomes a management problem. The relevant skills become clarity, delegation, sequencing, review loops, and operational control. See Addy Osmani, “Your AI coding agents need a manager” and My LLM coding workflow going into 2026.

GitHub’s own documentation reinforces the same reality. Pull requests created by Copilot coding agent are meant to be reviewed, and custom instructions affect review behavior in concrete ways. The tool is not documenting a world where professional review disappears; it is documenting a world where review becomes even more central. See GitHub Docs, “Copilot coding agent” and GitHub Docs, “Using Copilot code review”.

This is not a minor operational detail. It reveals what serious vendors themselves believe: the future of AI coding is not autonomous engineering without accountability. It is assisted engineering with stronger scaffolding.

What production-grade AI-assisted engineering actually looks like

The healthiest version of AI coding today does not try to replace engineering. It tries to compress the implementation path inside engineering.

A production-grade flow looks more like this:

Human defines the product rule or system concern.
Human fixes the stack and architectural boundaries.
Human decomposes the work into a bounded delivery unit.
Repository-level instructions, examples, and constraints are made explicit.
AI generates within the allowed envelope.
Automated checks run.
Human reviews for business correctness, architectural fit, and long-term maintainability.
The change is either accepted, corrected, or rejected.

Notice what does not happen here: the model is not treated as the source of system design truth.

That role still belongs to human engineering judgment.

The uncomfortable economic truth

A great many AI-generated software problems are not primarily aesthetic problems. They are economic problems.

A team can survive ugly code longer than it can survive incoherent code. The reason is simple: ugly code may still remain predictable. Incoherent code destroys predictability.

Once predictability is gone, the cost of every future change rises:

diagnosis takes longer,
onboarding slows down,
regressions multiply,
duplicated logic hides defects,
business-rule changes become risky,
and architectural debt compounds nonlinearly.

This is why a flashy early demo is such a weak signal.

The real test of a product is not whether the model can generate a compelling first wave of functionality. The real test is whether the system remains understandable and economically changeable after repeated iterations.

That is where canon, bounded delivery, and experienced oversight stop being “nice to have” and become the foundation.

Conclusion

Vibe coding is not inherently the enemy.

The enemy is uncontrolled generation without canon, without boundaries, without clear business rules, without delivery discipline, and without an operator who understands the technical consequences of the choices being made.

AI can absolutely accelerate software development. It can improve throughput, compress boilerplate, speed up prototyping, and assist with refactoring, exploration, and implementation.

But none of that eliminates the need for architectural judgment.

If anything, the current generation of tools increases the value of engineering discipline. The bottleneck is no longer the ability to produce lines of code. The bottleneck is the ability to ensure that those lines belong to one coherent system.

The teams that will benefit most from AI will not be the ones that generate the most code the fastest.

They will be the teams that define the best operating environment for AI to generate the right code, in the right shape, inside the right system.

That is not the death of engineering.

It is the return of engineering to its real job.