正在加载视频...

视频加载失败

🌶️ Hot take: The only way Autonomous Multi-Agent Systems work is by adding Reasoning & Agentic Context. I've tried it all, and here are my learnings👇

91,445 次观看 • 1 年前 •via X (Twitter)

11 条评论

Ashpreet Bedi 的头像
Ashpreet Bedi1 年前

At @AgnoAgi we've been building multi-agent systems for almost 2 years using the handoff/transfer pattern that is becoming popular now. (Spoiler Alert: It doesnt work) Here's a video from over a year ago that demonstrates this:

Ashpreet Bedi 的头像
Ashpreet Bedi1 年前

There are two approaches to multi-agent systems: - Autonomous: A leader Agent orchestrates member Agents to achieve the task. The developer builds the Team & Agents but lets the leader Agent decide how to solve the task. This is 50% software engineering and 50% AI engineering. - Controlled: The developer explicitly defines the Teams, Agents, and workflow steps needed to accomplish the task. This requires substantial effort, 99% software engineering and 1% AI engineering. Because our clients demand reliability, we have traditionally guided them toward controlled workflows. It has been the only way to achieve consistent outputs from multi-agent systems.

Ashpreet Bedi 的头像
Ashpreet Bedi1 年前

Many AI influencers built their name selling the Autonomous pattern. After all, we all want this utopia — write some agents, assign them roles, assemble them into a team, and voilà, they'll cure cancer. But this doesn't work. We know it, and deep down, they know it too. If this "Autonomous" pattern doesn't work reliably with humans, how can it possibly work with next-token-predictors?

Ashpreet Bedi 的头像
Ashpreet Bedi1 年前

Autonomous Multi-Agent systems create impressive demos, but when you run the same task 10,000 times, the output variance is far too high for production use. Ask yourself: If you had an add(x, y) function and ran add(1, 1) five times with results like 1.7, 2.2, 2.1, 1.8, and 2.0, would you deploy it? No—you'd make five demos and share only the one where add(1, 1) returns exactly 2, ignoring the rest.

Ashpreet Bedi 的头像
Ashpreet Bedi1 年前

Not only that, they’re impossible to evaluate, and you can’t improve what you can’t measure.

Ashpreet Bedi 的头像
Ashpreet Bedi1 年前

However, recent research is changing this and Anthropic’s "ThinkTool" was a breakthrough (imo). We've extended this research, teaching Agents not only to "Think" but also to "Analyze". Adding these "ReasoningTools" to multi-agent systems significantly improves outcomes. Here's ReasoningTools for Agents:

Ashpreet Bedi 的头像
Ashpreet Bedi1 年前

By adding `Reasoning` to Multi-Agent Systems: The Team leader first "plans" the task using the "Think" tool, orchestrates member Agents, and then evaluates the results using the "Analyze" tool. From my limited experience, this approach is changing the game. Autonomous Agent Teams can now, consistently solve complex problems with low variance for the first time. Check out the `Think` -> `Orchestrate` -> `Analyze` pattern in action, this is a fairly hard task so you know we're not playing here. (Note: I trimmed the video and playback is at 1.8x - please run this yourself to test)

Ashpreet Bedi 的头像
Ashpreet Bedi1 年前

The problem with these systems isnt response quality, that we can improve. The problem is reliability and variance. Till now, running autonomous multi-agent systems produced wildly inconsistent results over thousands of runs. But with the `Analyze` step, the Team Leader is much better at orchestration and thinks, validates and evaluates before returning the final result -- which we're seeing greatly improves reliability, or in other terms - reduces variance.

Ashpreet Bedi 的头像
Ashpreet Bedi1 年前

Is this perfect, definitely not and we're still experiementing. But early testing is showing better, more consistent results -- which is what im after.

Ashpreet Bedi 的头像
Ashpreet Bedi1 年前

Thank you for reading, if you liked this, give Agno a try:

Alexander Myasoedov 的头像
Alexander Myasoedov1 年前

INTRODUCING: Agentic Security - LLM Security Scanner! 🔍 🔑 Features: Scans for prompt injections, jailbreaking & more. Provides detailed reports & options to customize attack rules. 🔗access the GitHub Link ↓

相关视频