The Intelligence Was in the Organization All Along

The unit of design is not the individual, and it is not the agentic system. It is the organization.

Enterprise AI is being deployed into expert organizations as if the work were a collection of tasks.

Extract the document. Summarize the memo. Review the contract. Test the control. Produce the output faster.

But the organizations that do audit, legal, regulatory, clinical, and financial judgment do not work because tasks are completed efficiently. They work because judgment moves through a structure, from the practitioner who notices something, to the reviewer who asks whether the judgment holds, to the director who sees a pattern across workstreams, to the principal who watches the health of the process itself.

The work product matters, of course. But the organization depends on something harder to see: the flow of observations, doubts, exceptions, corrections, and calibration between people.

And somehow, that is the part we keep managing to break.

Outsourcing broke it by separating the people doing the work from the people responsible for evaluating it. RPA broke it by automating routine steps without capturing the judgment that surrounded them. AI could break it again, at much larger scale, by producing cleaner outputs while stripping away the signals that expert hierarchies need to function.

I do not think that is the only possible path.

AI could help us redesign the organization itself: not by replacing individual professionals, and not by preserving every old layer of the hierarchy, but by improving how information, judgment, and correction move through the system.

The unit of design is not the individual, and it is not the agentic system. It is the organization.

Expert organizations are not factories

Organizations are the best mechanism humans have found for working on problems that no individual can fully hold alone.

No single person can see the whole of a financial system, a clinical trial, a regulatory regime, or a complex legal matter. But society as a whole is too diffuse to make the specific judgments these problems require. So we built something in between: organizations with shared goals, specialized roles, and structures for moving information between people.

That structure matters because expert work is not factory work.

In a factory, intelligence can be embedded in the process. Standardize the parts, standardize the handoffs, make the work repeatable, and the output becomes more reliable. The goal is to reduce variation.

Expert organizations work differently. Their hardest problems resist full standardization because the work depends on observing messy evidence, recognizing patterns, weighing exceptions, and making judgments that others will rely on. A good audit, clinical review, investigation, or legal analysis is not just the sum of completed tasks. It is the result of many partial observations being tested, challenged, corrected, and recomposed into a judgment that can hold.

That is why hierarchy exists.

It is easy to see hierarchy only as authority, especially because organizations often misuse it that way. But in expert organizations, hierarchy also serves a cognitive purpose. Herbert Simon treated organizations as systems for coping with bounded rationality: ways of distributing attention, information, and decision-making across people who could not individually hold the whole problem. In Elliott Jaques's terms, each level carries a different scope of judgment and operates over a different time horizon. The practitioner sees what is happening in the case. The reviewer asks whether the judgment holds. The director looks for patterns across workstreams. The principal watches the health of the process itself.

Expert organizations do not just divide work. They move judgment across levels of responsibility.

The intelligence is not located in any one person. It lives in the connections between them: what gets passed upward, what gets corrected, what gets escalated, what gets absorbed. The organization made complex judgment possible by breaking it apart and recomposing it through structure.

These organizations have never been perfect, and they have always been a compromise. No audit team, clinical team, legal team, or regulator can review every piece of evidence with equal depth, so they assess risk, direct attention, and decide where human judgment matters most. The structure works because information moves through it in ways that no individual could reproduce alone.

That is the part automation has a habit of missing.

The damage was never just automation

Once you see organizations this way, the past twenty years of efficiency programs look different.

The usual story is that firms moved routine work out of expensive expert teams because it looked easier to standardize: evidence gathering, document preparation, reconciliations, first-pass testing, standardized review steps.

But the separation was never as clean as it looked.

Take audit as an example. When foundational testing moved to shared service centers, the output often still arrived, sometimes faster. The spreadsheet was complete, the boxes were checked, the file had the expected artifacts.

What disappeared was harder to see: the questions, hesitations, anomalies, and small acts of judgment that used to surround the work. A reviewer could still inspect the final output, but they had less visibility into how the work had been done, what had seemed strange, where the practitioner had been uncertain, and what patterns were starting to emerge.

The problem was not simply that junior people lost practice, although they did. The deeper problem was that the organization lost signal.

The PCAOB's inspection results make that concern difficult to dismiss. Among the Big Four, audit deficiency rates rose from roughly 12% in 2020 to 26% in 2022. Mid-tier firms fared worse, with deficiency rates above 60%. Those numbers do not prove that outsourcing alone caused the problem, but they do show a profession whose quality model is under strain at exactly the moment more work has been standardized, distributed, and pushed away from the engagement team.

One audit partner put the learning problem plainly: "Our staff now will never see cash testing, as it is done offshore. We are going to see the impact of that when they are managers."

That is true. But it is only part of the story. The quality problem does not begin ten years later, when the apprentice fails to become an expert. It begins immediately, when the hierarchy receives output without the surrounding judgment.

RPA repeated the pattern in a different form. It automated steps inside the process, but the bottleneck was rarely the mechanical step itself. The bottleneck was the meaning around the step: why something was escalated, what exception was ignored, what uncertainty remained, what a human would have noticed but not written down.

The old review meetings, side conversations, and informal corrections were inefficient. They were also carrying information. When organizations replaced or bypassed those interactions without designing a new channel for the signal, they mistook the artifact for the work.

Automation has a habit of doing that.

AI is entering a weakened hierarchy

AI is now being introduced into organizations that have already spent years weakening the channels through which judgment used to travel.

That is what makes this moment different from outsourcing and RPA. AI is not only being applied to routine work at the bottom of the hierarchy. It is being added everywhere at once: to the practitioner drafting the memo, the reviewer checking the work, the director monitoring the portfolio, and the executive trying to understand whether the whole process is improving or deteriorating.

The risk is not only that AI will make mistakes. Of course it will. The larger risk is that AI will produce clean, confident, plausible output while further compressing the signals the hierarchy needs to function.

The risk is not only wrong output. It is a thinner handoff.

There is also a structural reason this moment may go differently. Outsourcing and RPA did not trigger the complementary investments that would have preserved judgment, because the outputs still looked right. The spreadsheet arrived, the checklist was complete, and the loss was invisible until quality degraded years later. AI is different because it enters every level simultaneously, it produces more convincing output at each level, and the signals it compresses are harder to recover after the fact. That combination makes the design problem more urgent, but also more visible, if organizations choose to look.

The evidence suggests most are not yet looking. Economist Enterprise's 2026 report, Making AI Deliver, surveyed 1,221 executives at large enterprises and found a striking contradiction: more than four in five said their AI programs were beating expectations, yet only about two in five firms formally required teams to track business impact. Nearly nine in ten CTOs said their AI rollout was ahead of schedule, while only about three in four senior vice-presidents and vice-presidents agreed.

That is not a success story with a minor measurement gap. It is the exact oversight gap the rest of this piece is about. Organizations are reporting that AI is working while lacking the instrumentation to know whether it is. The people closer to the work see the divergence first.

The same report found that about three in five firms review AI systems during development and before deployment, but fewer than two in five continue that oversight after a system goes live. One in eight reviews governance only when something goes wrong. For expert organizations, that is exactly backward: the real test is not whether a system looked acceptable before deployment, but whether it continues to support judgment once it is inside the work.

The danger is that we repeat the same pattern with a more powerful technology. We preserve the artifact, accelerate the handoff, and make the output cleaner, while losing even more of the judgment that was supposed to move through the organization.

Richer edges, not replacement nodes

The usual debate about AI in expert work gets stuck between two unsatisfying options: preserve the old apprenticeship model, or automate the work and move on.

Both miss the real design problem. The apprenticeship model did important work, but it was never especially efficient. Junior professionals spent hundreds of hours on repetitive tasks in order to encounter a small number of moments where judgment actually developed. Automation without design has the opposite problem. It removes the repetition, but often removes the surrounding judgment with it: the trail of uncertainty, the reason something was flagged, the alternative interpretation that was considered and rejected.

AI makes a different design possible, not by optimizing nodes, but by enriching the edges between them.

Consider what happens today when an audit practitioner hands work to a reviewer. The reviewer receives a workpaper: the completed testing, the conclusion, the sign-off. What they do not receive is the practitioner's reasoning. The anomaly they noticed but could not fully explain. The place where the population felt different from last year. The judgment call about which exceptions to investigate and which to set aside. The moment where AI-generated analysis looked plausible but did not match what the practitioner expected from experience.

Now consider what a redesigned handoff could carry. Not just the workpaper, but the evidence trail: what the practitioner observed, where they were uncertain, what the AI contributed and where the human overrode it, what changed after review, what was escalated and why, what correction was applied and whether similar corrections have recurred across engagements. The reviewer does not just see the artifact. They see the judgment that produced it.

That is what I mean by richer edges. Each handoff carries more of the reasoning, uncertainty, and correction that used to disappear between levels.

Organizations are the scale at which individual cognition becomes collective agency.

At the director level, the same principle applies at a different scope. AI should make patterns visible across workstreams: recurring exceptions, inconsistent judgments, repeated overrides, areas where teams are compensating for process gaps. At the principal level, the question is whether the process itself is improving: whether uncertainty is being escalated, whether corrections are being absorbed, whether the organization is learning from its own work.

This is the organizational redesign problem that Erik Brynjolfsson, Daniel Rock, and Chad Syverson describe in their work on the productivity J-curve. General-purpose technologies do not produce their full gains when firms simply adopt them. The gains arrive when organizations make the complementary investments that change how work is actually done. For AI in expert organizations, one of those missing investments is the redesign of the judgment channels themselves.

A flatter hierarchy may eventually be possible, but not for the reason efficiency programs usually assume. The case for fewer layers should not be that people can be cut out. It should be that the information architecture has changed, that each role works with more context and each handoff carries more reasoning. That is a design question, not a headcount shortcut.

This is where work like Centaurian AI's on how individual specialists actually partner with AI tools becomes important. Getting the specialist-plus-AI partnership right at the individual level is foundational. But the node alone is not enough. The design has to extend to what flows between nodes: what gets passed forward, what gets escalated, what gets corrected, and what the organization learns from the interaction.

The signals are already there

Claude Shannon's work on information theory gives us a useful way to see what hierarchy has always been doing. A hierarchy is, in part, a compression system for judgment. Each level reduces the work below into something the level above can process. That compression was necessarily lossy because human channels are limited: meetings are short, review notes are partial, and much of what people notice, question, correct, or worry about never gets written down.

AI changes the possible bandwidth of those channels.

Every interaction between a human and an AI system leaves traces: prompts, responses, edits, overrides, escalations, ignored suggestions, repeated corrections, moments of trust, moments of hesitation. In an expert organization, these are not just product analytics. They are evidence about how judgment is moving through the system.

Most organizations are not yet treating those traces as evidence about the work. They may monitor whether an AI system is accurate, fast, adopted, or cost-effective. Those measures matter. But they do not tell leaders whether the organization is becoming better at the work, whether uncertainty is being escalated, whether reviewers are correcting the right things, whether teams are learning from errors, or whether AI is quietly narrowing the information that reaches the next level.

This is what high reliability organizations have always understood. In Managing the Unexpected, Karl Weick and Kathleen Sutcliffe argue that reliable organizations do not achieve safety by pretending the world is stable. They achieve it by staying sensitive to operations, noticing weak signals, revising their understanding, and updating context before small anomalies become large failures.

AI should strengthen that capability: faster context updating. Not just faster output, but a richer picture of what is happening inside the work.

The humans operating inside these systems are not outside the AI system, merely reviewing its output. They are part of the system. Their questions, corrections, refusals, escalations, and workarounds are the mechanism by which the organization learns. For AI in expert work, monitoring cannot stop at model behavior. The goal is not only to detect when an AI system produces a bad answer, but to understand how the organization responds: whether practitioners notice, whether reviewers correct, whether directors see the pattern, whether the process itself is improving or degrading.

For the first time, organizations have a practical way to observe the judgment layer: not perfectly, not automatically, and not without careful design, but far more directly than before. The conversations that used to vanish in meetings, side channels, review notes, and individual memory can become part of the learning system itself.

That is the opportunity AI creates: not just faster or cheaper work, but an organization that can see more clearly how judgment is forming, where it is breaking down, and how it can improve.

AI changes the node. The larger opportunity is changing what flows between nodes.

Citations


Marisa Ferrara Boston holds a PhD in Cognitive Science from Cornell University and has spent her career building systems for expert augmentation and machine adaptation in hierarchical knowledge organizations.  Through Reins AI, she works with organizations deploying AI in regulated expert domains to design evaluation, monitoring, and adaptation loops that preserve and strengthen human judgment.

Thank you! You will receive an email shortly.
Oops! Something went wrong while submitting the form.

Our other articles

All articles