Why Multi-Agent Systems Must Be Sequential — And How It is Built in Agent Smith
Most multi-agent demos look impressive. Five agents discussing a problem, contradicting each other, refining ideas, eventually converging on a solution. It feels like watching a real engineering team at work.
But it isn’t.
It’s a room where everyone talks at the same time, nobody takes minutes, and at the end, a decision appears — with no way to explain how it was reached. No compliance officer would accept that. No auditor. No enterprise customer.
Yet this is exactly how many agent architectures are designed: parallel reasoning, implicit aggregation, unclear responsibility.
Atomic tasks can run in parallel. Decisions can’t.
This article explains why I chose a fundamentally different architecture for Agent Smith, my open-source AI coding agent — and what that architecture looks like under the hood.
The Problem With Parallel Agent Systems
When multiple agents reason about the same context in parallel, three things break down.
Accountability disappears. If three agents contribute to a plan simultaneously, who owns the final decision? When the implementation fails, the trace points everywhere and nowhere. In an enterprise context, “the agents discussed it” is not an acceptable answer.
Reproducibility dies. Run the same parallel discussion twice and you get different results. Different timing, different context windows, different conclusions. That’s fine for a demo. It’s a dealbreaker for production systems that need deterministic behavior.
Governance becomes impossible. Auditing a parallel discussion means reconstructing a web of overlapping reasoning. There’s no clear sequence, no decision owner per step, no structured handover. You can’t audit what you can’t trace.
The alternative isn’t less intelligence. It’s more structure.
The Architecture: Cascading Commands on a Flat Pipeline
Agent Smith’s multi-skill system is built on a single architectural principle: commands execute sequentially on a flat pipeline, and each command can insert new commands directly after itself at runtime.
No tree structures. No recursive executors. No parallel pipelines. Just a linked list of commands that grows dynamically as the system discovers what needs to happen.
Why a Linked List, Not a Tree
At design time, you can’t know which skills are needed or how many rounds of discussion will occur. A tree would require predicting the shape of the conversation before it happens. A flat list with runtime insertion gives you full flexibility while keeping the execution model trivially simple.
The PipelineExecutor iterates through a LinkedList<string>. After each command completes, it checks whether the result includes follow-up commands. If it does, those commands are inserted immediately after the current position. Then execution continues to the next node.
Pipeline before Triage:
FetchTicket
→ CheckoutSource
→ LoadDomainRules
→ AnalyzeCode
→ Triage
→ Approval
→ Execute
→ Test
→ CommitAndPR
Pipeline after Triage inserts discussion:
FetchTicket
→ CheckoutSource
→ LoadDomainRules
→ AnalyzeCode
→ Triage
→ [SkillRound:architect:1]
→ [SkillRound:devops:1]
→ [SkillRound:backend-dev:1]
→ [ConvergenceCheck]
→ Approval
→ Execute
→ Test
→ CommitAndPR
Every command is visible in the pipeline. Every insertion is logged. There’s no hidden execution.
Safety: Convergence by Design, Not by Luck
A cascading system can theoretically insert commands forever. Agent Smith prevents this at the architectural level: each role gets a maximum of three discussion rounds. If there’s no consensus after three rounds, the system doesn’t force one — it escalates to a human. That’s the abort logic that most autonomous systems lack. Not a hard timeout, but a structured admission that the system has reached the limits of what it can resolve on its own. Below that, a technical ceiling of 100 total command executions acts as a final safety net against unforeseen loops.
Roles, Not Agents
Agent Smith doesn’t use generic “agents.” It uses roles — each with a specific perspective, set of rules, and convergence criteria.
Roles are defined as YAML files shipped with the system:
- Architect — evaluates component boundaries, patterns, cross-cutting concerns
- Backend Developer — assesses feasibility, proposes code structure, flags performance issues
- DevOps — evaluates infrastructure impact, CI/CD changes, deployment risks
- Tester — defines test strategy, identifies edge cases
- Security Reviewer — flags authentication, authorization, data exposure risks
Each role has explicit rules about what it should evaluate and — just as importantly — what it should not do. The Architect doesn’t propose patterns that aren’t established in the project. The Developer doesn’t reorganize the codebase. Constraints are as important as capabilities.
Project-Level Configuration
Each project gets a skill.yaml that defines which roles are enabled and adds project-specific context. A pure backend project disables the Frontend Developer. A project using ArgoCD for deployments adds that constraint to the DevOps role. The system auto-detects sensible defaults during initialization but allows full customization.
How a Discussion Works
When Agent Smith picks up a ticket, the pipeline flows through three phases: Triage, Discussion, and Convergence.
Phase 1: Triage
The TriageCommand analyzes the ticket against available roles and decides who needs to participate. A simple bug fix might only need the Backend Developer. No discussion, straight to implementation. A new feature touching API design, infrastructure, and business logic triggers a multi-role discussion.
Triage determines:
- Which roles participate
- Who leads (creates the initial plan)
- The expected complexity
It then inserts SkillRoundCommand entries into the pipeline, one per participating role, followed by a ConvergenceCheckCommand.
Phase 2: Skill Rounds
Each SkillRoundCommand loads the role’s rules and generates a contribution based on the ticket, project context, and critically all previous discussion entries. The discussion builds sequentially. Each role sees what came before and responds to it.
The key mechanism: if a role objects to the current plan, it inserts follow-up commands after itself. The target of the objection gets another round, then the objecting role follows up. This creates a natural back-and-forth without any parallel execution.
SkillRound:architect:1 → proposes plan
SkillRound:devops:1 → agrees
SkillRound:backend-dev:1 → objects to architect's pattern choice
→ inserts: SkillRound:architect:2, SkillRound:backend-dev:2, ConvergenceCheck
SkillRound:architect:2 → adjusts plan
SkillRound:backend-dev:2 → agrees
ConvergenceCheck → consensus reached
Every role ends its contribution with an explicit verdict: AGREE, OBJECTION [target_role], or SUGGESTION. No ambiguity and no implicit consensus.
Phase 3: Convergence
The ConvergenceCheckCommand evaluates whether all objections have been resolved. If yes, it consolidates the discussion into a final implementation plan. If not, and the maximum number of rounds hasn’t been reached, it inserts more rounds. If the discussion stalls at the maximum, it escalates to a human.
This is the human-in-the-loop that actually matters: not a rubber stamp on every step, but a circuit breaker when the system can’t resolve a disagreement on its own.
The Execution Trail
Every command that runs whether it’s fetching a ticket, switching a skill, or a role contributing to the discussion is recorded in the Execution Trail. Each entry captures:
- Command name and active skill
- Success or failure
- Duration
- Number of commands inserted
The trail is written into the result output as a readable table:
| # | Command | Skill | Result | Duration | Inserted |
|----|----------------------------|--------------------|--------------------|----------|----------|
| 1 | FetchTicket | - | OK: Ticket fetched | 1.2s | - |
| 5 | Triage | - | OK: Lead: architect| 4.2s | +4 |
| 6 | SkillRound:architect:1 | architect | OK: Plan created | 8.3s | - |
| 8 | SkillRound:backend-dev:1 | backend-developer | OK: Objection | 6.7s | +3 |
| 9 | SkillRound:architect:2 | architect | OK: Adjusted | 5.4s | - |
| 11 | ConvergenceCheck | - | OK: Consensus | 4.8s | - |
| 15 | CommitAndPR | - | OK: PR #42 created | 3.1s | - |
Total: 15 commands, 87.6s, $0.34
This is the audit log that enterprise systems need. Not a chat transcript. Not a vague summary. A structured, timestamped, cost-tracked record of every decision.
What This Means for Enterprise AI
The conversation about AI agents in enterprises is happening on the wrong level. The question isn’t how intelligent the agents are or how many can run in parallel. The question is:
Can you defend the decisions your AI made?
Sequential execution with explicit roles, structured handovers, convergence detection, and a full execution trail gives you that. Not because it’s the most impressive architecture. But because it’s the one that survives contact with compliance, auditing, and real organizational accountability.
Agent Smith is open source. The architecture described in this article is implemented and available on GitHub.
If you’re building autonomous systems for enterprise contexts and want to discuss sequential multi-agent architectures, I’d like to hear from you.
Holger is a freelance software consultant specializing in .NET, Azure and AI-assisted development workflows. He builds Agent Smith as an open-source project to demonstrate how autonomous coding agents can be structured, auditable and enterprise-ready.
Leave a Reply