Loading video...
Video Failed to Load
🌶️ Hot take: The only way Autonomous Multi-Agent Systems work is by adding Reasoning & Agentic Context. I've tried it all, and here are my learnings👇
91,445 views • 1 year ago •via X (Twitter)
11 Comments

At @AgnoAgi we've been building multi-agent systems for almost 2 years using the handoff/transfer pattern that is becoming popular now. (Spoiler Alert: It doesnt work) Here's a video from over a year ago that demonstrates this:

There are two approaches to multi-agent systems: - Autonomous: A leader Agent orchestrates member Agents to achieve the task. The developer builds the Team & Agents but lets the leader Agent decide how to solve the task. This is 50% software engineering and 50% AI engineering. - Controlled: The developer explicitly defines the Teams, Agents, and workflow steps needed to accomplish the task. This requires substantial effort, 99% software engineering and 1% AI engineering. Because our clients demand reliability, we have traditionally guided them toward controlled workflows. It has been the only way to achieve consistent outputs from multi-agent systems.

Many AI influencers built their name selling the Autonomous pattern. After all, we all want this utopia — write some agents, assign them roles, assemble them into a team, and voilà, they'll cure cancer. But this doesn't work. We know it, and deep down, they know it too. If this "Autonomous" pattern doesn't work reliably with humans, how can it possibly work with next-token-predictors?

Autonomous Multi-Agent systems create impressive demos, but when you run the same task 10,000 times, the output variance is far too high for production use. Ask yourself: If you had an add(x, y) function and ran add(1, 1) five times with results like 1.7, 2.2, 2.1, 1.8, and 2.0, would you deploy it? No—you'd make five demos and share only the one where add(1, 1) returns exactly 2, ignoring the rest.

Not only that, they’re impossible to evaluate, and you can’t improve what you can’t measure.

However, recent research is changing this and Anthropic’s "ThinkTool" was a breakthrough (imo). We've extended this research, teaching Agents not only to "Think" but also to "Analyze". Adding these "ReasoningTools" to multi-agent systems significantly improves outcomes. Here's ReasoningTools for Agents:

By adding `Reasoning` to Multi-Agent Systems: The Team leader first "plans" the task using the "Think" tool, orchestrates member Agents, and then evaluates the results using the "Analyze" tool. From my limited experience, this approach is changing the game. Autonomous Agent Teams can now, consistently solve complex problems with low variance for the first time. Check out the `Think` -> `Orchestrate` -> `Analyze` pattern in action, this is a fairly hard task so you know we're not playing here. (Note: I trimmed the video and playback is at 1.8x - please run this yourself to test)

The problem with these systems isnt response quality, that we can improve. The problem is reliability and variance. Till now, running autonomous multi-agent systems produced wildly inconsistent results over thousands of runs. But with the `Analyze` step, the Team Leader is much better at orchestration and thinks, validates and evaluates before returning the final result -- which we're seeing greatly improves reliability, or in other terms - reduces variance.

Is this perfect, definitely not and we're still experiementing. But early testing is showing better, more consistent results -- which is what im after.

Thank you for reading, if you liked this, give Agno a try:

INTRODUCING: Agentic Security - LLM Security Scanner! 🔍 🔑 Features: Scans for prompt injections, jailbreaking & more. Provides detailed reports & options to customize attack rules. 🔗access the GitHub Link ↓


