codingsoul

Can You Defend Your AI’s Decisions?

Why Multi-Agent Systems Must Be Sequential in Agent Smith.

Most multi-agent demos look impressive. Five agents discussing a problem, contradicting each other, refining ideas, eventually converging on a solution. It feels like watching a real engineering team at work.

But it isn’t.

Imagine a development team working like this. It’s a room where everyone talks at the same time. Nobody is going to take minutes. It is unclear who is responsible for what and finally, a decision appears with no way to explain how it was reached.

Yet this is exactly how many agent architectures are designed for parallel reasoning, implicit aggregation thus unclear responsibility.

Atomic tasks can run in parallel. Decisions can’t.

This article explains why I chose a fundamentally different architecture for Agent Smith, an open-source AI coding agent and what that architecture looks like under the hood.


The Problem With Parallel Agent Systems

When multiple agents reason about the same context in parallel, three things break down.

Accountability disappears: If three agents contribute to a plan simultaneously, who owns the final decision? When the implementation fails, the trace points everywhere and nowhere. In an enterprise context, “the agents discussed it” is not an acceptable answer.

Reproducibility dies: Run the same parallel discussion twice and there are different results. This can happen because of different timing or context windows or different conclusions. There are tasks where this makes sense, like your very fancy new virtual employee with OpenClaw. But this does not work from my point of view when wanting to produce code automatically. That needs deterministic behavior.

Governance becomes impossible: Auditing a parallel discussion means reconstructing a web of overlapping reasoning. A missing sequence means losing control. There needs to be a decision owner per step as well as a structured handover. It is not possible to audit what is not tracable.


The Architecture: Cascading Commands on a Flat Pipeline

Agent Smith’s multi-skill system is built on a single architectural principle: commands execute sequentially on a flat pipeline, and each command can insert new commands directly after itself at runtime.

I obviously skipped the idea of tree structures. Trees makes it more hard to iterate. I dont want to have parallel pipeline in decision-making. Obviously to speed up the overall process, some of the tasks can be done in parallel. Planning is not part of this. So it is going to be a linked list of commands that grows dynamically as the system discovers what needs to happen.

Why a Linked List, Not a Tree

Certainly at design time it is not possible to know which skills are needed or how many rounds of discussion will occur. A flat list with runtime insertion gives full flexibility while keeping the execution model trivially simple.

The PipelineExecutor iterates through a LinkedList<string>. After each command completes, it checks whether the result includes follow-up commands. If it does, those commands are inserted immediately after the current position. Then execution continues to the next node.

Pipeline before Triage:

FetchTicket 
→ CheckoutSource
→ LoadDomainRules
→ AnalyzeCode
→ Triage
→ Approval
→ Execute
→ Test
→ CommitAndPR

Pipeline after Triage inserts discussion:

FetchTicket 
→ CheckoutSource
→ LoadDomainRules
→ AnalyzeCode
→ Triage
→ [SkillRound:architect:1]
→ [SkillRound:devops:1]
→ [SkillRound:backend-dev:1]
→ [ConvergenceCheck]
→ Approval
→ Execute
→ Test
→ CommitAndPR

Every command is visible in the pipeline. Every insertion is logged. There’s no hidden execution.

Safety: Convergence by Design, Not by Luck

A cascading system can theoretically insert commands forever. Agent Smith prevents this at the architectural level: each role gets a maximum of three discussion rounds. If there’s no consensus after three rounds, the system doesn’t force one it escalates to a human. Not a hard timeout, but a structured admission that the system has reached the limits of what it can resolve on its own. Below that, a technical ceiling of 100 total command executions acts as a final safety net against unforeseen loops.


Skilled Agents

For a proper execution and the best results possible, Agent Smith uses Skills for every role in a usual development team. Each with a specific perspective, set of rules, and convergence criteria.

Roles are defined as YAML files shipped with the system:

  • Architect: evaluates component boundaries, patterns, cross-cutting concerns
  • Backend Developer: assesses feasibility, proposes code structure, flags performance issues
  • DevOps: evaluates infrastructure impact, CI/CD changes, deployment risks
  • Tester: defines test strategy, identifies edge cases
  • Security Reviewer: flags authentication, authorization, data exposure risks

Each role has explicit rules about what it should evaluate and what it should not do. The Architect doesn’t propose patterns that aren’t established in the project. The Developer doesn’t reorganize the codebase. Constraints are as important as capabilities.

Project-Level Configuration

Each project gets a skill.yaml that defines which roles are enabled and adds project-specific context. A pure backend project disables the Frontend Developer. A project using ArgoCD for deployments adds that constraint to the DevOps role. The system auto-detects sensible defaults during initialization but allows full customization.


How a Discussion Works

When Agent Smith picks up a ticket, the pipeline flows through three phases: Triage, Discussion, and Convergence.

Phase 1: Triage

The TriageCommand analyzes the ticket against available roles and decides who needs to participate. A simple bug fix might only need the Backend Developer. As multi agents scenarios are unluckily also about money, there will be no discussion, just a straight to implementation. A new feature touching API design, infrastructure, and business logic triggers a multi-role discussion.

Triage determines:

  • Which roles participate
  • Who leads (creates the initial plan)
  • The expected complexity

It then inserts SkillRoundCommand entries into the pipeline, one per participating role, followed by a ConvergenceCheckCommand.

Phase 2: Skill Rounds

Each SkillRoundCommand loads the role’s rules and generates a contribution based on the ticket, project context, and critically all previous discussion entries. The discussion builds sequentially. Each role sees what came before and responds to it.

The key mechanism: if a role objects to the current plan, it inserts follow-up commands after itself. The target of the objection gets another round, then the objecting role follows up. This creates a natural back-and-forth without any parallel execution.

SkillRound:architect:1 → proposes plan
SkillRound:devops:1 → agrees
SkillRound:backend-dev:1 → objects to architect's pattern choice
  → inserts: SkillRound:architect:2, SkillRound:backend-dev:2, ConvergenceCheck
SkillRound:architect:2 → adjusts plan
SkillRound:backend-dev:2 → agrees
ConvergenceCheck → consensus reached

Every role ends its contribution with an explicit verdict: AGREE, OBJECTION [target_role], or SUGGESTION. No ambiguity and no implicit consensus.

Phase 3: Convergence

The ConvergenceCheckCommand evaluates whether all objections have been resolved. If yes, it consolidates the discussion into a final implementation plan. If not, and the maximum number of rounds hasn’t been reached, it inserts more rounds. If the discussion stalls at the maximum, it escalates to a human.

This is the human-in-the-loop that actually matters: not a rubber stamp on every step, but a circuit breaker when the system can’t resolve a disagreement on its own.


The Execution Trail

Every command that runs whether it’s fetching a ticket, switching a skill, or a role contributing to the discussion is recorded in the Execution Trail. Each entry captures:

  • Command name and active skill
  • Success or failure
  • Duration
  • Number of commands inserted

The trail is written into the result output as a readable table:

| #  | Command                    | Skill              | Result              | Duration | Inserted |
|----|----------------------------|--------------------|--------------------|----------|----------|
| 1  | FetchTicket                | -                  | OK: Ticket fetched | 1.2s     | -        |
| 5  | Triage                     | -                  | OK: Lead: architect| 4.2s     | +4       |
| 6  | SkillRound:architect:1     | architect          | OK: Plan created   | 8.3s     | -        |
| 8  | SkillRound:backend-dev:1   | backend-developer  | OK: Objection      | 6.7s     | +3       |
| 9  | SkillRound:architect:2     | architect          | OK: Adjusted       | 5.4s     | -        |
| 11 | ConvergenceCheck           | -                  | OK: Consensus      | 4.8s     | -        |
| 15 | CommitAndPR                | -                  | OK: PR #42 created | 3.1s     | -        |

Total: 15 commands, 87.6s, $0.34

The command execution is directly part of the audit log that enterprise systems need. A structured, timestamped, cost-tracked record of every decision.


What This Means for Enterprise AI

The conversation about AI agents in enterprises is happening on the wrong level. The question isn’t how intelligent the agents are or how many can run in parallel. The question is:

Can you defend the decisions your AI made?

Sequential execution with explicit roles, structured handovers, convergence detection, and a full execution trail enables exactly that. This is going to contact with compliance, auditing, and real organizational accountability.

Agent Smith is open source. The architecture described in this article is implemented and available on GitHub.

Any thoughts? Let’s discuss.


Holger is a freelance software consultant specializing in .NET, Azure and AI-assisted development workflows. He builds Agent Smith as an open-source project to demonstrate how autonomous coding agents can be structured, auditable and enterprise-ready.