Last week, EY announced the global rollout of enterprise-scale agentic AI across 160,000 audit engagements in more than 150 countries. It's a remarkable achievement. The press release covers multi-agent frameworks, Microsoft partnerships, responsible AI principles, and workforce training programs.
It has nothing to say about what happens after deployment.
I don't say that to criticize EY. They're not alone in this silence. Across every regulated industry, audit, finance, healthcare, legal, organizations are deploying AI systems at a scale that would have been unimaginable five years ago. And almost none of them are asking the question that will define whether those deployments succeed or quietly degrade: what happens when it breaks, and who knows how to drive it when it does?
We got a glimpse of what that looks like in March, when a security firm's autonomous agent breached McKinsey's internal AI platform in two hours. The platform was used by over 40,000 employees for strategy work and client research. The vulnerability had been sitting there for two years, invisible to conventional scanners. The most alarming detail wasn't the breach itself. It was that the instructions controlling how the AI responded to thousands of consultants were stored in the same database. An attacker could have silently altered them. No deployment. No code change. No traditional trace. Nobody would have noticed.
That's not a security failure. That's a design philosophy failure. Nobody thought about the drivers.
They bought the engine. Then they bolted on whatever was lying around.
Stewart Brand's new book Maintenance: Of Everything opens a conversation the technology industry urgently needs to have. Brand has spent decades thinking about how things last, and his argument, applied to AI, lands with unexpected force.
Here's the honest diagnosis of where enterprise AI actually is right now: organizations didn't build anything. They acquired an extraordinary engine, powerful, capable, genuinely impressive, licensed from someone else, and bolted it into their existing processes. Then they added tires from one vendor, brakes from another, a gear shift that sort of works. The result moves. Sometimes impressively. But nobody in the organization fully understands the whole system, nobody designed it to be understood, and when something goes wrong there's no manual, no mechanic, and no one who knows the road.
The Rolls Royce gets invoked a lot in AI conversations, and it's not entirely wrong, but it implies the problem is fragility. The real problem is something different. The Rolls Royces still running a century later aren't running because they never broke. They're running because they were built for a world where dedicated mechanics, proprietary parts, and manufacturer relationships were assumed. That world doesn't scale to 160,000 audit engagements. And it was never designed for the person who just needs to get to work.
Henry Ford understood something different. The Model T wasn't just cheaper. It was designed to be driven, understood, maintained, and adapted by the people who used it every day. The knowledge escaped the factory. Mechanics happened. Drivers happened. A whole vernacular ecosystem of expertise grew up around it, distributed and improvable by anyone. Not just the few who could afford dedicated support. Anyone who needed to get somewhere.
That's the gap enterprise AI has opened. They bought the engine. They're missing the vehicle anyone can drive.
What I watched happen
I was part of a team at Google Research working on exactly this problem: building systems designed to learn from the people who used them, through structured feedback loops, expert communities, and what we called adaptation via feedback. The premise was bidirectional. Models adapt toward usefulness, and the people operating them develop genuine understanding of the system, and the loop between them is where reliability actually lives.
Then the scaling bet took over. Enough data, enough compute, and the human teaching signal becomes noise. The teams working on human-oriented adaptation got reorganized. The work didn't disappear, but the conviction that it was necessary did.
Here's what that bet gets wrong in enterprise contexts specifically. The feedback that trains general models is implicit, aggregate, and divorced from real work. What gets captured is clicks, session length, thumbs up or down. What doesn't get captured is the partner who rewrote the output in a meeting, the manager who caught the error three steps downstream, the auditor who knew the answer was wrong but couldn't explain why to the system. That's the expert signal. That's the tacit knowledge that makes the difference between a system that works in a demo and one that works in a real audit of a real company with a real set of problems nobody anticipated.
That signal is evaporating into the air of every enterprise deployment right now, uncaptured, every single day. The scaling bet assumes someone is collecting it. Nobody is.
I watched the gap open. Capability accelerated. The understanding required to operate, adapt, and trust these systems in real environments did not keep pace. That gap is now visible in every enterprise AI deployment that behaves differently in production than it did in the lab, and nobody quite knows why, and nobody quite knows what to do about it.
The loss you're already taking
The industry is still framing this as future risk. It isn't. It's present-tense ongoing loss.
Every AI system running in production without this infrastructure is degrading right now. Not dramatically. Quietly, through the accumulation of failures that aren't caught, aren't classified, aren't routed to anyone who could fix them. The system that worked beautifully in the demo drifts from its environment. The environment changes and the system doesn't. The people who need to drive it learn workarounds instead, because the real controls aren't available to them.
This is the modern car problem. Vehicles got sealed, software-dependent, proprietary. The right-to-repair movement exists because people recognized what was being lost: not just the ability to fix things, but the ability to understand what you own well enough to use it on your own terms. Enterprise AI is making the same mistake at much higher cost and much higher stakes.
There's a counterargument worth taking seriously. Move fast, deploy, then tear out what doesn't work and optimize once you know what to keep. It's a philosophy that has worked in software before. But it has a hidden prerequisite: you need enough signal to know what to keep. At one development team working on a contained system, that signal is visible. At 160,000 audit engagements across 150 countries, it isn't. The signal lives in the heads of the professionals using the system every day: the workarounds they've developed, the judgment calls they've made, the edge cases they've quietly routed around. It doesn't flow back to the central team automatically. It evaporates. You can only optimize what you can see. And right now, almost nobody is building the infrastructure to see it.
Complex does not mean only-the-builder-can-operate-it. That's a design choice, not an inevitability.
What the steering wheel looks like
There's a version of "solving this" that's actually just a different form of the same problem. It goes like this: don't worry about the drivers, because the car drives itself. We've built a self-driving system. Just tell it where you want to go.
That sounds reassuring until you think about what it assumes. Self-driving only works on a known road. Audit is not a known road. Every client is different. Every year is different. The financial system changes. Regulations change. Edge cases multiply. The entire value of having expert auditors is that they can navigate complexity and novelty that no fixed system anticipated. An AI that drives itself on a single road isn't a solution to that problem. It's a paternalistic restatement of it, dressed up as progress.
What's needed is not a self-driving car. What's needed is a vehicle that anyone who needs to use it can actually drive, understand, and adapt to wherever they need to go.
We're currently building monitoring and reliability infrastructure for one of the Big Four, a firm that made a different choice. They started by asking: how will we know when this is wrong? How will we fix it when it is? How do we make sure the people who need to drive this system every day understand it well enough to do that?
That's not a slower approach. That's the only approach that produces something durable.
It means monitoring infrastructure that tells you not just whether the system ran, but whether it did the right thing. Triage logic that classifies failures by severity. Repair processes that turn production failures into test cases. Verification before fixes ship. And crucially, knowledge that transfers to the people who operate the system, so that understanding doesn't stay locked in the builders. So that anyone who needs to drive it can.
This is what Brand means by maintenance as a philosophy rather than a task. Buildings that last aren't the ones that never need repair. They're the ones designed so that repair is possible, understandable, and something their occupants can do themselves.
The AI systems that will still be running well in ten years won't be the most sophisticated at deployment. They'll be the ones whose operators learned to understand them, adapt them, and make them their own. Not because they had dedicated support on retainer. Because the system was designed so that anyone who needed to use it could.
They bought the engine. Now let's build something they can drive.
Dr. Marisa Ferrara Boston is Managing Partner of Reins AI, which builds evaluation and reliability infrastructure for enterprise AI systems. She previously contributed to AI research at Google and served as Lead AI and Automation Architect at KPMG.







