Skip to main content
The journal
EngineeringJune 20269 min

Most multi-agent orchestration deployments are solving coordination problems the architecture introduced.

Since January we have reviewed or inherited twenty-three production AI systems; eleven used a multi-agent orchestration framework as their primary architectural layer. Four have since been rebuilt to single-agent designs at lower latency, lower debugging cost, and output quality the domain experts could not distinguish. In nine of the eleven, the multi-agent structure was solving a coordination problem the architecture itself had created. Here is the failure mode, when multi-agent design is genuinely necessary, and the single question we now require an answer to before approving any orchestration design.

By
Sher Ghan
Principal AI Engineer
Most multi-agent orchestration deployments are solving coordination problems the architecture introduced.

The pitch arrives in diagram form. Several labelled boxes — Planner, Researcher, Writer, Critic — connected by arrows showing context flowing between agents. The orchestrating agent receives a goal and delegates subtasks to specialists; each specialist returns its output; the orchestrator synthesises. The diagram looks like a mature architecture. In the majority of enterprise AI deployments that reach us in this shape, it is not.

Since the beginning of this year we have reviewed or inherited twenty-three production AI systems. Eleven of them used a multi-agent orchestration framework — LangGraph, AutoGen, CrewAI, or a bespoke variant built on similar principles — as their primary architectural layer. Of those eleven, four have been subsequently rebuilt to single-agent architectures with explicit tool interfaces. In every case the rebuild improved latency, reduced incident resolution time, and produced outputs the domain experts who evaluated both versions could not distinguish in quality. In nine of the eleven — before and after the rebuilds — the multi-agent structure had been solving coordination problems that the orchestration layer itself had created.

This is not an argument against multi-agent systems. We build them; we have three running in production today that we consider well-suited to their tasks. It is an argument that multi-agent orchestration has become the architectural default in enterprise AI — reached for before the problem has been understood well enough to establish whether it is the right tool — and that the cost of that default is being paid quietly in most of the production systems currently using it.

#02When the diagram precedes the problem

The pattern I encounter most often goes like this. A team has a brief for an AI-assisted research and drafting workflow — gather information from multiple sources, analyse it, produce a structured output. The engineers, experienced and current with the field, reach for a multi-agent pattern because it maps neatly onto the task description. The Researcher gathers; the Analyser structures; the Writer drafts; the Critic reviews. The architecture is on the whiteboard before the team has established what the actual bottleneck in the task is.

In most enterprise research and drafting tasks, the bottleneck is not orchestration. It is the quality of the retrieval, and the quality of the instruction the model receives. A single agent with a well-designed retrieval tool, a structured output schema, and an explicit self-review step will handle the majority of these tasks at equivalent quality. We verified this directly in three engagements in the first quarter: in all three, a single-agent prototype produced outputs the domain experts could not distinguish from the multi-agent version, at 58 to 71 per cent of the wall-clock time.

The reason the multi-agent diagram maps so readily onto the problem is that it mirrors the organisational structure of the task rather than the technical requirements of it. Research and drafting in a human team uses a researcher, an analyst, and a writer because human cognitive capacity is finite and specialisation compounds output quality. The model does not share those constraints. It can hold the full task in context, retrieve, reason, and draft in a single coherent pass. Decomposing that into specialised agents does not exploit a model capability — it works around a human limitation that does not apply.

#03The coordination problems the architecture creates

Three failure modes appear consistently in multi-agent deployments where a single-agent architecture would have been sufficient. None of them appear in demos. All of them show up in production.

The first is context attenuation. When an orchestrator passes a task to a subagent, it must serialise its current understanding of the goal and state into a prompt — which means deciding, at serialisation time, what information is relevant enough to forward. Information the orchestrator considers peripheral is permanently unavailable to the subagent. In single-agent architectures, the model has access to everything it has processed in the current session. At every multi-agent boundary, context must be explicitly selected and threaded through. We identified this as the root cause of quality degradation in two of the four systems we rebuilt this year. In both cases the subagent producing the final output was working from a progressively impoverished representation of the original goal, and the outputs drifted in ways that were plausible within the subagent's context and wrong relative to the original intent.

The second failure mode is error absorption. When a subagent produces imperfect output and passes it to a downstream agent, the receiving agent typically works with what it has rather than surfacing the imperfection upstream. The architecture does not enforce hard stops at agent boundaries; it encourages graceful continuation. In a legal document analysis workflow we took over in February, the extraction subagent was producing structured output with systematic gaps — missing clauses in complex multi-part provisions — and the synthesis agent downstream was working around those gaps in a way that produced results plausible enough to pass human review. The pattern was invisible across four hundred documents before a domain expert identified it. In a single-agent design, the model would have seen its own extraction within the same context, noticed the gap, and returned to complete it.

The third problem is the debugging surface. A multi-agent trace is not one thing; it is n things, one per agent invocation, with context that was serialised at each boundary and is not fully recoverable from the log. When something goes wrong at step forty-seven of an agent chain, understanding why requires reconstructing what each agent held, what it considered, and what it discarded on forwarding. Across the eleven multi-agent deployments in our current managed estate, mean incident resolution time is 3.4 times higher than in single-agent deployments of comparable functional complexity. That ratio has been stable across every quarterly review since we began tracking it.

The architecture does not enforce hard stops at agent boundaries — it encourages graceful continuation. And a plausible-looking wrong answer is the hardest class of production failure to catch.

#04When multi-agent genuinely justifies its coordination overhead

There are tasks for which a multi-agent architecture is not merely convenient but necessary, and conflating the failure mode I have described with a general argument against multi-agent design would be a different kind of error.

The clearest valid case is genuine parallelism. When a goal decomposes into truly independent subproblems that can execute simultaneously, and when the time saving from parallel execution is material and measurable against the coordination overhead, multi-agent architecture earns its complexity. A competitive intelligence brief requiring simultaneous research across twelve markets can be meaningfully accelerated by running twelve subagents in parallel. A single-document research and drafting task, by contrast, is not structured this way — parallelism produces no throughput gain when the synthesising step still requires all inputs before it can begin.

The second valid case is capability composition across genuinely distinct models. When parts of a workflow require models that differ by deployment — a code execution agent with a local interpreter, a vision agent processing images the orchestrating model cannot handle, a fine-tuned domain specialist running on dedicated on-premises hardware — multi-agent architecture is the mechanism by which those capabilities are composed into a single workflow. The coordination overhead is necessary because there is no other way to assemble the capability.

The third case is context budget management on tasks that are genuinely long-horizon: where a single agent's context would be exhausted before the work is complete, and where explicit state handoffs across agent boundaries allow the problem to be partitioned across that constraint. This is a real architectural requirement. It is not, in our experience, the reason multi-agent designs are proposed in the majority of the enterprise AI conversations we are part of.

#05One question before approving an orchestration design

We have reduced our gate for approving multi-agent architecture to one question, asked of every team proposing an orchestration layer: what, specifically, is the single-agent equivalent failing to do, and have we verified that failure empirically rather than inferred it from the diagram?

In most cases, the answer to the first part is that it would be too complex as a single agent — a claim about prompt management or conceptual separation that rarely survives a working prototype. A single-agent design with structured output schemas, explicit branching in the tool interface, and a self-review step handles most of what multi-agent frameworks describe, without the context attenuation, the error absorption dynamics, or the extended incident resolution. Complexity in the prompt is debuggable. Complexity in the agent graph usually is not.

The multi-agent designs we have approved in the last six months share a common feature: the team had built a single-agent prototype first and could demonstrate, specifically, what it could not do. The parallelism was real and timed. The capability gap was specific to a deployment constraint that could be named. The context budget was genuinely exhausted against actual task inputs, not against a scaled-up estimate of what inputs might eventually look like. In those cases, multi-agent is the right design. When the answer to our question is that the diagram is cleaner with multiple agents, it is not, and we say so before the build begins.

#06A note on the frameworks

This is not a critique of LangGraph, AutoGen, or CrewAI as engineering tools. They are competently built frameworks, well-suited to the use cases they were designed around, and they lower the cost of building multi-agent patterns that genuinely require them. The problem is not in the frameworks. It is in how they have become the reflexive first tool when a task description evokes agent-like language — when 'the system should gather, analyse, and then synthesise' reads as a multi-agent requirement before anyone has established whether coordination is solving a real problem or introducing one.

The useful thing about orchestration frameworks is that they make multi-agent patterns accessible without requiring a team to build the orchestration layer from scratch. The unhelpful thing is that accessibility makes a pattern feel like a default. In eight of the eleven multi-agent systems we reviewed this year, the team's first architectural question was which framework to use, not whether the problem required a framework at all. The most reliable systems we have in production — multi-agent and single-agent alike — are the ones where every coordination layer was built in response to a demonstrated constraint, not to a diagram that felt complete.

The coordination overhead is a tax. It is a tax on latency, on debugging surface, on the legibility of production incidents. Before paying it, it is worth establishing what, exactly, it is buying.

About the author
Sher Ghan
Principal AI Engineer

Every piece in the Journal is written personally by a senior practitioner, drawing on the engagement that motivated it. No ghostwriters, no content team, no models. If a paragraph here resonates with a problem you are looking at, the author is the person to reply to — direct lines beat anonymous inboxes.

Get in touch with the practice