What is a multi-agent architecture? A multi-agent architecture assigns different roles or capabilities to separate LLM-powered agents that coordinate to complete a task. Instead of one model handling everything, specialised agents handle subtasks: a planner agent breaks down the goal, a researcher agent gathers information, a coder agent writes code, a reviewer agent checks quality. Coordination happens through shared state, message passing, or a central orchestrator. The appeal is intuitive: specialisation improves quality, and parallelisation improves speed. The reality is more nuanced — coordination overhead, state management complexity, and failure cascade risks can outweigh the benefits for many workloads. When does multi-agent coordination add value? Scenario Value Add Why Tasks requiring diverse tools High Different agents can specialise in different tool APIs Parallel subtasks High Independent subtasks execute concurrently Tasks requiring self-review Medium Separate reviewer agent avoids self-assessment bias Simple sequential tasks Low Single agent handles these with less overhead Tasks with tight latency budgets Low Coordination adds 2–4× latency In our deployments, multi-agent architectures provide clear value in two scenarios: (1) complex research tasks where one agent searches, another synthesises, and a third verifies — each agent’s output quality benefits from role specialisation; and (2) code generation tasks where a planner, coder, and tester agent operate in a loop — the separation between generation and evaluation produces higher-quality code than a single-agent loop. What coordination patterns exist? Hierarchical: A manager agent delegates subtasks to worker agents and synthesises their results. The manager maintains the overall plan; workers execute without awareness of the full context. This is the most common pattern because it maps naturally to management structures. The weakness: the manager becomes a bottleneck and single point of failure. Peer-to-peer: Agents communicate directly, passing messages or shared state. Each agent decides when to act and what to communicate. This pattern enables emergent behaviour but is difficult to debug — the interaction between agents can produce unexpected outcomes that no single agent’s behaviour predicts. Pipeline: Agents are arranged in a fixed sequence, each processing the output of the previous agent. This is the simplest pattern and the easiest to debug, but it does not support iterative refinement or parallel execution. For the design patterns that govern individual agent behaviour within a multi-agent system, our guide to agent design patterns covers ReAct, Plan-and-Execute, and Reflection loops. Where do multi-agent systems break? The failure modes specific to multi-agent systems (beyond individual agent failures): State divergence: Agents operating on stale or inconsistent state make conflicting decisions. If the researcher agent finds information that invalidates the planner’s original plan, but the planner has already dispatched tasks based on the old plan, the system produces inconsistent results. Cost explosion: Each agent interaction involves LLM API calls. A multi-agent system with 4 agents making 5 calls each per task uses 20× the API cost of a single-agent approach. Without per-agent and per-task cost budgets, multi-agent systems can generate unexpectedly large API bills. Accountability gaps: When the system produces an incorrect result, identifying which agent caused the error requires tracing through the full interaction log. We implement per-agent output validation — checking each agent’s output against format and content constraints before passing it to the next agent — which catches errors at their source rather than propagating them through the system. How do you decide between single-agent and multi-agent? The decision framework we use evaluates four factors: Task decomposability: Can the task be split into subtasks that benefit from independent processing? If the task is inherently sequential (each step depends on the full output of the previous step), multi-agent architecture adds coordination overhead without enabling parallelism. If subtasks are independent (research one topic while generating code for another), multi-agent enables concurrent execution. Role specialisation value: Does assigning different system prompts, tools, or models to different subtasks improve quality? If the task benefits from a single consistent context (writing a document with a unified voice), a single agent is preferable. If the task benefits from specialised perspectives (a researcher who prioritises comprehensiveness and a writer who prioritises clarity), multi-agent adds value. Error isolation need: Does the application require that errors in one component do not cascade to others? Multi-agent architectures with validation gates between agents provide natural error boundaries. A reviewer agent that checks the coder agent’s output catches errors before they propagate to the deployment agent. Budget constraints: Multi-agent architectures consume 3–20× more tokens than single-agent approaches for the same task. For cost-sensitive applications or high-volume workloads, this multiplier may be prohibitive. Our default recommendation: start with a single agent with a well-structured prompt. If quality issues emerge that are attributable to role confusion (the agent tries to research, plan, and execute simultaneously and does all three poorly), consider splitting into a planner and executor. Add additional agents only when measurement demonstrates that the added complexity improves output quality by more than the coordination overhead degrades it. For production systems, we implement multi-agent architectures with comprehensive logging at every agent boundary — recording inputs, outputs, token usage, latency, and any validation results. This observability infrastructure is essential for diagnosing the coordination failures that inevitably emerge in multi-agent systems. Without it, debugging a multi-agent failure requires reproducing the full agent interaction, which is non-deterministic due to LLM sampling and may not reproduce on retry.