AI Doesn’t Hallucinate. It Amplifies the Drift You Already Had.
Most companies blame the model. The model is just showing you what your system forgot about itself.
The Failure Mode Nobody Named
Here is what the failure actually looks like.
The system compiles. The tests pass. The dashboards are green. Features ship on schedule. And underneath all of that, the architecture is dissolving. Not dramatically, not visibly, but structurally. Invariants that used to hold are bending. Boundaries that used to mean something are blurring. Logic added three sprints ago is quietly contradicting logic added three quarters ago. Nobody catches it because nobody is looking at the global picture. Every team is looking at its own context. The system looks coherent from every local vantage point and is incoherent from the only vantage point that matters.
Senior engineers recognize this immediately. They feel it before they can name it. A change that should be trivial suddenly requires touching six services. A refactor that looks contained keeps pulling in dependencies nobody expected. A bug that gets fixed keeps returning in a different form. The system is not broken. It has forgotten itself.
This failure mode has been present in software for decades. It has always been manageable for one reason: humans compensated for it. Senior engineers carried the architecture in their heads. Teams enforced invariants through habit. Organizations maintained coherence through cultural transmission of rules that were never written down because they never needed to be. The humans were the memory layer. The system borrowed continuity from the people who built it, free of charge, invisibly, for as long as those people stayed.
AI removes the human buffer. Not gradually. Immediately.
A model does not inherit the unwritten rules. It does not absorb context through repetition. It does not carry yesterday’s architectural decision into today’s implementation. It sees the artifacts — the code, the tests, the documents — and it extends them. Pattern by pattern. Inference by inference. With no access to the reasoning that gave those artifacts their shape. The model does not introduce chaos. It reveals the chaos that was already present, and then amplifies it at the speed of inference.
The industry keeps calling this hallucination. That framing is wrong, and the wrongness matters. Hallucination implies invention. The model is not inventing contradictions. The problem is not the model. The problem is the room it was placed in.
Amplification is only dangerous when there is nothing stable to amplify.
The Stack Has No Memory
Every company building software today has the same stack. Git. Jira. Confluence. A logging platform. A CI/CD pipeline. Architecture diagrams in various states of staleness. The appearance of documentation is convincing. What the stack actually contains is not.
Git stores text, not decisions. A commit history tells you what changed. It does not tell you why the boundary being modified exists, what breaks if it moves, or which services upstream depend on the assumption being altered. Jira stores tasks, not truth. A ticket tells you what was requested. It does not encode the constraint that shaped the implementation or the invariant that implementation was designed to preserve. Confluence stores documents, not architecture. Even when current, it describes structure without preserving the reasoning that makes the structure load-bearing rather than accidental. Logs store events, not intent. An event stream tells you what happened. It is silent on whether what happened was consistent with what was supposed to remain permanently true.
The entire stack is a collection of artifacts with no mechanism to bind them into a coherent world model.
Humans have always bridged this gap through reconstruction. A Slack thread from six months ago. A code comment that survived three refactors. A half-remembered conversation from a planning session nobody documented. Engineers with five years of context on a system can reconstruct its constraints from fragments, quickly, reliably, without being conscious they are doing it. This reconstruction is invisible inside human engineering teams. It is completely unavailable to AI.
A model cannot reconstruct what was never encoded. It treats every artifact as equally authoritative. It cannot distinguish the constraint that is load-bearing from the pattern that is historical debris. It cannot tell the difference between a boundary that must hold and a boundary that happened to hold until last Thursday.
Cognition’s Devin launched in March 2024 as the first fully autonomous software engineer. It achieved a 13.86% success rate on SWE-bench. Answer.AI’s independent testing in January 2025 found it completed three of twenty real-world tasks. Not because the model was weak. Because the tasks required holding architectural context across files, sessions, and decisions, and the substrate had no mechanism to maintain that context. GitHub Copilot Workspace shows the same failure mode at larger scale: impressive in isolation, brittle the moment the work requires system-level coherence. LangChain, AutoGPT, and CrewAI have each collapsed under production load for the same structural reason. They multiply the surfaces of drift without providing any mechanism for coherence.
These are not model failures. They are substrate failures.
The more complex the system becomes, the less it knows about itself. The moment AI enters the loop, that ignorance becomes visible at a speed and scale that no human review process was built to absorb.
A system that cannot remember cannot cohere.
The Drift Economy
Drift is the default state of modern software. It does not arrive as a crisis. It accumulates in the gaps between teams, in the shortcuts taken to hit a sprint deadline, in the decisions nobody documented because everyone assumed they were obvious at the time.
The incentives make this structural, not accidental. Companies reward motion, not memory. Dashboards measure output: features shipped, velocity sustained, tickets closed. Leadership celebrates architectural integrity in the abstract and deprioritizes it in practice the moment a roadmap deadline arrives. Every team optimizes for its own context because that is what the organization rewards. The result is predictable. The system grows and its identity dissolves simultaneously. The codebase becomes a record of what was done, with no durable record of why.
AI accelerates this in a specific way. A model generates solutions that are locally correct and globally incompatible. It fills gaps that were never meant to be filled. It extends patterns that were never meant to be extended. It reinforces assumptions that were never meant to be permanent. DeepMind’s 2024 analysis of code quality in AI-assisted repositories found statistically significant increases in duplication, dead code, and boundary violations compared to human-only codebases of equivalent age. Not because the model generated bad code locally. Because it had no mechanism to maintain global consistency.
The loop is self-reinforcing. Drift creates ambiguity. Ambiguity creates more drift. Every shortcut creates the need for another shortcut. Every inconsistency requires another patch. Every patch requires another abstraction. The system grows more complex and less coherent at the same time.
This is the moment organizations describe as “AI hallucination.” That framing misidentifies the cause and therefore misidentifies the fix. The model is not hallucinating. It is multiplying the state of a codebase that already forgot itself.
AI does not create drift. It compounds it.
The Invariant Collapse
Invariants are the parts of a system that are supposed to survive everything else. Not preferences. Not conventions. The rules that define what the system actually is. A user belongs to exactly one account. An order is charged once. A boundary between services encodes a specific judgment about ownership and coupling. These are not implementation details. They are the physics of the system. When they hold, the architecture behaves like a single organism. When they fail, the system becomes a collection of disconnected behaviors that only look coherent because nobody has zoomed out far enough to see the contradictions accumulating.
Invariants erode. They erode because the system has no persistent record of why they exist. They erode because the rationale lives in people’s heads and those people change teams, leave companies, or simply forget. They erode because every team sees its local context, not the global constraint. Human engineers compensate through intuition: an internalized sense of what is safe to change and what is load-bearing. AI has no such intuition. It sees a pattern, not a principle. It extends the pattern even when the extension violates the principle, because the principle was never encoded in a form the model can consult.
I built a production platform over eight to ten months. 300,000 lines of code, dozens of microservices, event-driven architecture, GDPR compliance, Terraform-managed infrastructure, 2,256 automated tests with 15,998 assertions. The system held together not because the models were exceptional but because I built an explicit constraint system around them: living architecture documents, invariant catalogues, surgical-change rules, escalation gates for contradictions, mandatory context reconstruction at the start of every session. Without that structure, the models reliably broke invariants they had no memory of establishing. Not occasionally. Consistently. Every session that began without explicit context reconstruction produced drift. Every change that wasn’t constrained to a surgical scope produced violations. The constraint system was not a process preference. It was the only thing standing between a coherent system and an expanding collection of locally reasonable, globally incompatible changes.
The system still compiles when invariants break. The unit tests still pass. The dashboards still show green. The violation is invisible at the local level where the change was made. It is visible only at the global level, in a system whose load-bearing constraints have been undermined by a thousand small, individually justified changes. By the time the symptoms appear — impossible states, contradictory logic, data that cannot be reconciled — the underlying structure has already given way.
If your team is using AI in any part of your engineering workflow and you cannot point to a persistent, machine-readable record of your system’s invariants — not a document, not a convention, not a test suite, but a formal record of what must remain permanently true and why — then invariant erosion is not a future risk. It is a current state. The question is only how far it has progressed.
A system without invariants is a structure without physics. And once the physics disappear, collapse is a matter of time, not probability.
The Substrate Shift
Every major computing era follows the same pattern. Capability arrives first. Chaos follows. Then the realization that capability is not enough. That the missing piece is not more intelligence but a substrate that makes intelligence stable.
Personal computers needed operating systems not because CPUs were weak but because raw hardware was too chaotic to program against directly. The internet needed search not because information was scarce but because information without structure collapsed into noise. Cloud computing needed AWS not because servers were unavailable but because static provisioning could not track dynamic demand. In each case, the missing layer was not obvious until everything built without it had already demonstrated the failure. In each case, the entity that built the missing layer owned the architecture of everything above it.
AI is at that inflection now. The models are extraordinary. The systems built on top of them are fragile. The gap between the two is not intelligence. It is substrate.
The missing layer is not a smarter model. It is not a longer context window. It is not a better retrieval system or a more carefully engineered prompt. It is a persistent architectural substrate: the memory of what was decided, the invariants that must hold, the constraints that govern what is allowed to change, and the continuity that allows a system to evolve across time without dissolving into contradiction.
Goldman Sachs projects combined hyperscaler capex of $1.15 trillion from 2025 through 2027. The majority of that capital is funding the capability layer: models, compute, inference infrastructure. The substrate layer remains almost entirely unbuilt. The architectural memory, the invariant enforcement, the governed intelligence layer that sits between raw inference and reliable production systems — that layer does not exist yet as infrastructure. It exists only as manual process for the teams disciplined enough to build it by hand. That is not a sustainable position as AI becomes the dominant producer of engineering decisions.
The company that builds the substrate will not have the best model. It will have the layer that every model depends on. That is the same position AWS built in compute, Stripe built in payments, and Nvidia built in training hardware. The substrate becomes the platform. The platform owns the architecture of everything above it. The window between “models are capable enough” and “the platform has already formed” is narrow.
It is open now.
The substrate is not an enhancement. It is the missing floor. AI amplifies the state. The substrate defines it.
What This Means — For the People Who Have to Decide Right Now
For engineering leaders and CTOs: the drift you are experiencing in AI-assisted workflows is not a prompting problem. Better prompts do not fix a substrate problem. Every mitigation you have built, stricter review processes, constrained agent scopes, context-loading checklists, documentation systems maintained by hand, is a patch on a missing foundation. The correct question is not “how do we prompt more carefully.” It is “at what scale does our current approach collapse, and are we building toward that scale.” If your system grows, you are building toward that scale. The architectural memory layer is not a future concern. The cost of retrofitting it rises with every month of drift that compounds on top of the current substrate. The teams that recognize this now will not be rewriting their constraint systems under pressure in two years.
For founders building on AI: the agent frameworks and orchestration layers that feel like infrastructure today are not infrastructure. They are capability wrappers: useful at small scale, brittle under production load, and replaceable the moment the actual substrate layer arrives. The defensible position is not the wrapper. It is the domain knowledge, the compliance requirements, the invariants, and the architectural decisions that any serious substrate will need to encode. If you understand the specific physics of your domain, the invariants that must hold, the boundaries that are load-bearing, the constraints that distinguish your system from a generic one, you have the content that makes a substrate valuable. Build that understanding now, while the substrate is still being designed. Wait until the substrate exists and you will be populating someone else’s graph from scratch.
For investors: the current wave of AI tooling investment is largely funding scaffolding, not foundations. Scaffolding is valuable until the foundation arrives. At that point it becomes a dependency on the foundation or a casualty of it. The signal to look for is not capability demos. It is production deployments at scale, with failure modes documented and architectural constraints already encoded, with founders who have felt the substrate collapse under real load and understand precisely which layer is missing. The companies that will define the next decade are the ones building the layer beneath the scaffolding, the ones whose competitive position compounds with every architectural decision encoded into their system rather than decaying with every model upgrade. The window is open. It will not stay open.
The structural takeaway: modern software systems carry no memory of their own decisions. The architecture, the invariants, the rationale behind load-bearing constraints all live in human heads, not in the system itself. For decades this worked because humans supplied the continuity the system lacked. AI removes the human buffer and exposes the actual state: a codebase that has forgotten itself. Cognition’s Devin completed 3 of 20 real-world tasks not because the model was weak but because the tasks required architectural continuity the substrate could not provide. LangChain, AutoGPT, and CrewAI have each collapsed under production load for the same structural reason. The missing layer is not a smarter model or a longer context window. It is a persistent architectural substrate that stores decisions, enforces invariants, and provides the continuity that stateless prediction engines structurally cannot. Every major computing era required this layer before it could scale from demos to infrastructure. AI is at that threshold now. The substrate is the missing floor. Without it, everything above it inherits the instability of the foundation it is standing on.



Amazing POV, this resonated a lot.
The gap you described in the current stack feels very real. Systems keep moving forward with less and less context attached to them. Over time the code still compiles, the tests still pass, features keep shipping but the system slowly loses its identity.
Interestingly, I think this problem exists in humans as well. Teams carry context informally: conversations, intuition, institutional memory. As people change teams or leave, that context fades. AI just exposes the fragility of this much faster.
Reading the article made me wonder about the shape of the “architectural substrate” you describe. Would it look like a very comprehensive, continuously maintained artifact, something like a living system document that captures architectural decisions, invariants, and boundaries across teams? Or could it evolve into something more dynamic, perhaps a layer of systems or agents that continuously observe the codebase, infer and enforce invariants, and communicate architectural constraints as the system evolves? Or something else?
Thoughts?