正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

🌶️ Hot take: The only way Autonomous Multi-Agent Systems work is by adding Reasoning & Agentic Context. I've tried it all, and here are my learnings👇

Ashpreet Bedi

13,828 subscribers

91,445 次观看 • 1 年前 •via X (Twitter)

科学技术教育

Anya Rossi• Live Now

Private livecam show

11 条评论

Ashpreet Bedi 的头像

Ashpreet Bedi1 年前

At @AgnoAgi we've been building multi-agent systems for almost 2 years using the handoff/transfer pattern that is becoming popular now. (Spoiler Alert: It doesnt work) Here's a video from over a year ago that demonstrates this:

Ashpreet Bedi 的头像

Ashpreet Bedi1 年前

There are two approaches to multi-agent systems: - Autonomous: A leader Agent orchestrates member Agents to achieve the task. The developer builds the Team & Agents but lets the leader Agent decide how to solve the task. This is 50% software engineering and 50% AI engineering. - Controlled: The developer explicitly defines the Teams, Agents, and workflow steps needed to accomplish the task. This requires substantial effort, 99% software engineering and 1% AI engineering. Because our clients demand reliability, we have traditionally guided them toward controlled workflows. It has been the only way to achieve consistent outputs from multi-agent systems.

Ashpreet Bedi 的头像

Ashpreet Bedi1 年前

Many AI influencers built their name selling the Autonomous pattern. After all, we all want this utopia — write some agents, assign them roles, assemble them into a team, and voilà, they'll cure cancer. But this doesn't work. We know it, and deep down, they know it too. If this "Autonomous" pattern doesn't work reliably with humans, how can it possibly work with next-token-predictors?

Ashpreet Bedi 的头像

Ashpreet Bedi1 年前

Autonomous Multi-Agent systems create impressive demos, but when you run the same task 10,000 times, the output variance is far too high for production use. Ask yourself: If you had an add(x, y) function and ran add(1, 1) five times with results like 1.7, 2.2, 2.1, 1.8, and 2.0, would you deploy it? No—you'd make five demos and share only the one where add(1, 1) returns exactly 2, ignoring the rest.

Ashpreet Bedi 的头像

Ashpreet Bedi1 年前

Not only that, they’re impossible to evaluate, and you can’t improve what you can’t measure.

Ashpreet Bedi 的头像

Ashpreet Bedi1 年前

However, recent research is changing this and Anthropic’s "ThinkTool" was a breakthrough (imo). We've extended this research, teaching Agents not only to "Think" but also to "Analyze". Adding these "ReasoningTools" to multi-agent systems significantly improves outcomes. Here's ReasoningTools for Agents:

Ashpreet Bedi 的头像

Ashpreet Bedi1 年前

By adding `Reasoning` to Multi-Agent Systems: The Team leader first "plans" the task using the "Think" tool, orchestrates member Agents, and then evaluates the results using the "Analyze" tool. From my limited experience, this approach is changing the game. Autonomous Agent Teams can now, consistently solve complex problems with low variance for the first time. Check out the `Think` -> `Orchestrate` -> `Analyze` pattern in action, this is a fairly hard task so you know we're not playing here. (Note: I trimmed the video and playback is at 1.8x - please run this yourself to test)

Ashpreet Bedi 的头像

Ashpreet Bedi1 年前

The problem with these systems isnt response quality, that we can improve. The problem is reliability and variance. Till now, running autonomous multi-agent systems produced wildly inconsistent results over thousands of runs. But with the `Analyze` step, the Team Leader is much better at orchestration and thinks, validates and evaluates before returning the final result -- which we're seeing greatly improves reliability, or in other terms - reduces variance.

Ashpreet Bedi 的头像

Ashpreet Bedi1 年前

Is this perfect, definitely not and we're still experiementing. But early testing is showing better, more consistent results -- which is what im after.

Ashpreet Bedi 的头像

Ashpreet Bedi1 年前

Thank you for reading, if you liked this, give Agno a try:

Alexander Myasoedov 的头像

Alexander Myasoedov1 年前

INTRODUCING: Agentic Security - LLM Security Scanner! 🔍 🔑 Features: Scans for prompt injections, jailbreaking & more. Provides detailed reports & options to customize attack rules. 🔗access the GitHub Link ↓

相关视频

Build Multi Agent Systems with Reasoning and Context Thanks to sonnet-4, we have level 4 autonomous multi-agent systems working. Learn how to add: -> Reasoning Tools (think -> analyze) -> Shared Agentic Context -> Agentic Memory Code below 👇

Build Multi Agent Systems with Reasoning and Context Thanks to sonnet-4, we have level 4 autonomous multi-agent systems working. Learn how to add: -> Reasoning Tools (think -> analyze) -> Shared Agentic Context -> Agentic Memory Code below 👇

Ashpreet Bedi

38,576 次观看 • 1 年前

An autonomous agent is only as safe its reasoning. ThoughtProofAI verifies that reasoning before the agent acts - catching a flawed decision before it costs anything. Don't trust. Verify. Learn exactly how 👇

An autonomous agent is only as safe its reasoning. ThoughtProofAI verifies that reasoning before the agent acts - catching a flawed decision before it costs anything. Don't trust. Verify. Learn exactly how 👇

GOAT Network

72,279 次观看 • 1 个月前

The best way to learn AI is to build with agents. To help with that, we've launched hands-on labs and a new series on Agentic Engineering. First topic: Agent Skills. Next in the pipeline: planning, context engineering, multi-agent systems, long-running agents,.. Go build!

The best way to learn AI is to build with agents. To help with that, we've launched hands-on labs and a new series on Agentic Engineering. First topic: Agent Skills. Next in the pipeline: planning, context engineering, multi-agent systems, long-running agents,.. Go build!

elvis

31,802 次观看 • 2 个月前

Money is the coordination layer for agents. As multi-agent systems take on real tasks, they will need to quote, budget, pay, and settle as part of their reasoning loop. We are building this reasoning infrastructure in the open, for everyone.

Money is the coordination layer for agents. As multi-agent systems take on real tasks, they will need to quote, budget, pay, and settle as part of their reasoning loop. We are building this reasoning infrastructure in the open, for everyone.

Sentient

14,712 次观看 • 5 个月前

Yjs is the #1 library for collaborative editing on the web. Today we're announcing built-in Yjs support for Durable Streams. Use it to add multi-user and multi-agent collaboration to your AI apps and agentic systems. Links in the 🧵👇

Yjs is the #1 library for collaborative editing on the web. Today we're announcing built-in Yjs support for Durable Streams. Use it to add multi-user and multi-agent collaboration to your AI apps and agentic systems. Links in the 🧵👇

Electric

20,177 次观看 • 3 个月前

Introducing Electric Agents! Agents are not compute. Agents are data. Multi-agent is a sync problem. Electric Agents is the first agent platform built on sync. Use it to build scalable, collaborative, long-lived multi-agent systems: 🧵👇

Introducing Electric Agents! Agents are not compute. Agents are data. Multi-agent is a sync problem. Electric Agents is the first agent platform built on sync. Use it to build scalable, collaborative, long-lived multi-agent systems: 🧵👇

Electric

81,330 次观看 • 2 个月前

Anyone can build an agent. But to build a trustworthy agent at enterprise scale that is durable, long-running, optimized, contextually aware, and autonomous, the right infrastructure is required. At DevCon 6 we introduced the Agent Stack, the culmination of learnings gathered over years of agentic implementations. Orchestrator, Agent Engine, Agent SDK, Agent Builder, Agent Manager, AIP Evolve, SuperRepo, and so much more. All built on the Ontology, to power agents that actually work in production.

Anyone can build an agent. But to build a trustworthy agent at enterprise scale that is durable, long-running, optimized, contextually aware, and autonomous, the right infrastructure is required. At DevCon 6 we introduced the Agent Stack, the culmination of learnings gathered over years of agentic implementations. Orchestrator, Agent Engine, Agent SDK, Agent Builder, Agent Manager, AIP Evolve, SuperRepo, and so much more. All built on the Ontology, to power agents that actually work in production.

Palantir

80,428 次观看 • 11 天前

Agentic systems aren’t just for enterprises. As Humayun (Humayun) puts it: agents represent processes, software, even people, but the key shift is autonomy. Not just automation, but agents that can make decisions using context, history, and intent. The real unlock? Multi-agent systems, where different agents interact across environments, orgs, and ecosystems. That’s where it scales.

Agentic systems aren’t just for enterprises. As Humayun (Humayun) puts it: agents represent processes, software, even people, but the key shift is autonomy. Not just automation, but agents that can make decisions using context, history, and intent. The real unlock? Multi-agent systems, where different agents interact across environments, orgs, and ecosystems. That’s where it scales.

Fetch.ai

17,636 次观看 • 6 个月前

Agentic AI is only as strong as the search foundation beneath it. At AWS re:Invent 2025, Nick Patience and Elastic’s GM of Search Solutions, Steve Kearns (Steve Kearns), break down how enterprises are moving from classic RAG to autonomous, production-grade agentic systems—and how Elasticsearch is enabling more autonomous, reliable enterprise workflows.

Agentic AI is only as strong as the search foundation beneath it. At AWS re:Invent 2025, Nick Patience and Elastic’s GM of Search Solutions, Steve Kearns (Steve Kearns), break down how enterprises are moving from classic RAG to autonomous, production-grade agentic systems—and how Elasticsearch is enabling more autonomous, reliable enterprise workflows.

Six Five Media

223,766 次观看 • 7 个月前

New AI Agentic course! Learn to use LangGraph to build single and multi-agent LLM applications in AI Agents in LangGraph. This short course, taught by LangChain founder Harrison Chase Harrison Chase and Tavily founder @weiss_rotem, shows how to integrate agentic search to enhance an agent's knowledge with query-focused answers in predictable formats. Also learn to implement agentic memory to save state for reasoning and debugging, and see how human-in-the-loop input can guide agents at key junctures. You'll build an agent from scratch, then reconstruct it with LangGraph to thoroughly understand the framework. Finally, you'll build a sophisticated essay-writing agent that incorporates all the learnings from the course. Sign up here!

New AI Agentic course! Learn to use LangGraph to build single and multi-agent LLM applications in AI Agents in LangGraph. This short course, taught by LangChain founder Harrison Chase Harrison Chase and Tavily founder @weiss_rotem, shows how to integrate agentic search to enhance an agent's knowledge with query-focused answers in predictable formats. Also learn to implement agentic memory to save state for reasoning and debugging, and see how human-in-the-loop input can guide agents at key junctures. You'll build an agent from scratch, then reconstruct it with LangGraph to thoroughly understand the framework. Finally, you'll build a sophisticated essay-writing agent that incorporates all the learnings from the course. Sign up here!

Andrew Ng

152,597 次观看 • 2 年前

Kai-Fu Lee (founder of Sinovation Ventures) explains how the future is all about multi-agent systems. 1 agent today is like a pre-internet PC, useful but isolated. Connect agents, and they share context, split tasks, and coordinate instantly.

Kai-Fu Lee (founder of Sinovation Ventures) explains how the future is all about multi-agent systems. 1 agent today is like a pre-internet PC, useful but isolated. Connect agents, and they share context, split tasks, and coordinate instantly.

Rohan Paul

61,393 次观看 • 6 个月前

Google senior engineer just dropped a free 8-minute lesson on building AI agentic systems. This is the clearest explanation of multi-agent AI systems and loops you'll find on the internet. People still paying 500$ for agentic courses, while Google makes it for free. Agent Studio → Loops → Managed Agents → Antigravity → ADK 2.0 - thats the stack Watch it, then read the full guide on loop engineering below.

Google senior engineer just dropped a free 8-minute lesson on building AI agentic systems. This is the clearest explanation of multi-agent AI systems and loops you'll find on the internet. People still paying 500$ for agentic courses, while Google makes it for free. Agent Studio → Loops → Managed Agents → Antigravity → ADK 2.0 - thats the stack Watch it, then read the full guide on loop engineering below.

Codez

32,021 次观看 • 27 天前

Here is Kiro autonomous agent in preview. 👻 This quick demo shows how it takes a backlog task, runs the work autonomously, and returns reviewable changes when you stay in flow on higher-impact work. Watch Kiro autonomous agent in action 👉 #Agent #DevTools #SoftwareDevelopment #AICoding #FrontierAgent #AWSreInvent

Here is Kiro autonomous agent in preview. 👻 This quick demo shows how it takes a backlog task, runs the work autonomously, and returns reviewable changes when you stay in flow on higher-impact work. Watch Kiro autonomous agent in action 👉 #Agent #DevTools #SoftwareDevelopment #AICoding #FrontierAgent #AWSreInvent

Kiro

18,731 次观看 • 7 个月前

Introducing NEO: The first Agentic Machine Learning Engineer. It works like a full-stack ML engineer that never sleeps: handling data exploration, feature engineering, training, tuning, deployment, and monitoring, end to end. Powered by 11 specialized agents, NEO runs autonomously, saving ML engineers thousands of hours and making them 10x faster. Benchmarks: Tested on 75 Kaggle competitions, NEO scored a medal in 34% of them, significantly outperforming Microsoft’s RD Agent (22.4%) on the MLE Bench. This sets a new state of the art for autonomous ML systems. NEO runs on a novel multi-agent orchestrator, powered by a multi-step reasoning engine, context transfer protocol, and agent memory, built to solve complex workflows end to end. And with human-in-the-loop mode, you can guide, inspect, and override any step. You're always in command. NEO is built for real world workflows and ready for production. NEO is here to make every ML engineer truly superhuman. Watch NEO in action:

Introducing NEO: The first Agentic Machine Learning Engineer. It works like a full-stack ML engineer that never sleeps: handling data exploration, feature engineering, training, tuning, deployment, and monitoring, end to end. Powered by 11 specialized agents, NEO runs autonomously, saving ML engineers thousands of hours and making them 10x faster. Benchmarks: Tested on 75 Kaggle competitions, NEO scored a medal in 34% of them, significantly outperforming Microsoft’s RD Agent (22.4%) on the MLE Bench. This sets a new state of the art for autonomous ML systems. NEO runs on a novel multi-agent orchestrator, powered by a multi-step reasoning engine, context transfer protocol, and agent memory, built to solve complex workflows end to end. And with human-in-the-loop mode, you can guide, inspect, and override any step. You're always in command. NEO is built for real world workflows and ready for production. NEO is here to make every ML engineer truly superhuman. Watch NEO in action:

Neo AI

19,571,775 次观看 • 11 个月前

New short course: Practical Multi AI Agents and Advanced Use Cases with crewAI. Learn to build and deploy advanced agent-based systems in real applications in this course, created with CrewAI and taught by its founder, João Moura! (Disclosure: I've made a small seed investment in CrewAI.) In this course, you’ll learn how to create advanced agent-based apps that use external tools, do performance testing, can be trained with human feedback, and perform multiple tasks with different large language models. You will build several practical agentic apps that provide real business value, such as an automated project planning system, lead scoring and engagement pipeline, customer support data analysis, and a robust content creation system. In detail, you will learn how to: - Create these multi-agent systems with the building blocks of tasks, agents, and crews, along with the different things that make them work, such as caching, memory, and guardrails. - Integrate your multi-agent application with internal and external systems. - Connect multiple agents in complex setups, including parallel, sequential, and hybrid configurations, and create flows involving multiple agentic applications working together. - Test your agentic workflow and train it using human feedback to optimize its performance for better and more consistent results. - Work with multiple LLMs in your multi-agent system, using the appropriate model sizes and providers to fit each agent’s specific task. - Start a project from scratch in your environment and prepare it for deployment. You’ll also learn from an interview between João and Jacob Wilson, the Commercial GenAI Principal at PwC , in which they discuss deploying agentic workflows in real industry use cases. By the end of this course, you will be equipped to start building custom multi-agentic systems for your work. Please sign up here!

New short course: Practical Multi AI Agents and Advanced Use Cases with crewAI. Learn to build and deploy advanced agent-based systems in real applications in this course, created with CrewAI and taught by its founder, João Moura! (Disclosure: I've made a small seed investment in CrewAI.) In this course, you’ll learn how to create advanced agent-based apps that use external tools, do performance testing, can be trained with human feedback, and perform multiple tasks with different large language models. You will build several practical agentic apps that provide real business value, such as an automated project planning system, lead scoring and engagement pipeline, customer support data analysis, and a robust content creation system. In detail, you will learn how to: - Create these multi-agent systems with the building blocks of tasks, agents, and crews, along with the different things that make them work, such as caching, memory, and guardrails. - Integrate your multi-agent application with internal and external systems. - Connect multiple agents in complex setups, including parallel, sequential, and hybrid configurations, and create flows involving multiple agentic applications working together. - Test your agentic workflow and train it using human feedback to optimize its performance for better and more consistent results. - Work with multiple LLMs in your multi-agent system, using the appropriate model sizes and providers to fit each agent’s specific task. - Start a project from scratch in your environment and prepare it for deployment. You’ll also learn from an interview between João and Jacob Wilson, the Commercial GenAI Principal at PwC , in which they discuss deploying agentic workflows in real industry use cases. By the end of this course, you will be equipped to start building custom multi-agentic systems for your work. Please sign up here!

Andrew Ng

341,204 次观看 • 1 年前

I'm excited to share my talk on building Agentic RAG systems at last week's event with Arize AI and Google! 🥳 My talk covered: 1. The differences between Vanilla RAG and Agentic RAG 2. The agent ecosystem and how you can build agents today 3. How Weaviate AI Database is building agents with Generative Feedback Loops (GFLs) This snippet presents the multi-agent paradigm for Agentic RAG. I hope you find this talk interesting and would love to know what you think! The slide deck and full video links are below.

I'm excited to share my talk on building Agentic RAG systems at last week's event with Arize AI and Google! 🥳 My talk covered: 1. The differences between Vanilla RAG and Agentic RAG 2. The agent ecosystem and how you can build agents today 3. How Weaviate AI Database is building agents with Generative Feedback Loops (GFLs) This snippet presents the multi-agent paradigm for Agentic RAG. I hope you find this talk interesting and would love to know what you think! The slide deck and full video links are below.

Erika Shorten

61,317 次观看 • 1 年前

I have been testing DeepSeek-V4-Pro with the Pi coding agent. I am mindblown by how well it works out of the box. A few notes: I spent a few hours building an LLM wiki with an agent powered entirely by DeepSeek-V4-Pro on Fireworks AI inference. This is the first time I feel like there is an open-weight model that can reason at the level of Claude and Codex. And it does this in a cost-effective way with support for 1M context length. To be clear, I am using DeepSeek-V4-Pro inside of Pi without any special configuration. It works out of the box. It's exciting that there is a model that can just be plugged into a basic harness like Pi, and it just works. I've never seen that before. Most models require lots of configuration and setup. DeepSeek's DeepSeek-V4-Pro is clearly good at agentic coding (probably the best from the open-weight models), but the model is also great on knowledge-intensive tasks where reasoning matters. The agent pulled agentic engineering best practices from different company docs (Anthropic, OpenAI, Google, Stripe, Meta, Modal, DeepSeek, Mistral, Cohere), searched and digested Reddit and HN threads, summarized arxiv papers, and surfaced trending GitHub repos. Then it distilled everything into actionable tips across categories. I love the Wiki it built. The quality is really good. Here is a snapshot of what the wiki looks like: DeepSeek-V4-Pro handled the task without breaking stride. Multi-step research queries, code generation for scaffolding, context-heavy reasoning across disparate sources. For coding specifically, this is the first open-weight model that genuinely feels like a Codex or Claude Code experience. It compares in capability and actual multi-turn agentic work. What made the loop feel so responsive was Fireworks' inference speed (the fastest in the market) and the fact that they actually validate models at the systems level before shipping. No corrupted reasoning traces. Just fast, reliable iteration. The hybrid CSA and HCA attention design cuts KV cache to just 10% and inference FLOPs by nearly 4x at 1M-token context. This is what makes the agent loop actually fast and cheap enough to run in practice. For devs who've been watching open-weight models close the gap but haven't found one that actually delivers in practice, this is the closest I've seen. Try it here:

I have been testing DeepSeek-V4-Pro with the Pi coding agent. I am mindblown by how well it works out of the box. A few notes: I spent a few hours building an LLM wiki with an agent powered entirely by DeepSeek-V4-Pro on Fireworks AI inference. This is the first time I feel like there is an open-weight model that can reason at the level of Claude and Codex. And it does this in a cost-effective way with support for 1M context length. To be clear, I am using DeepSeek-V4-Pro inside of Pi without any special configuration. It works out of the box. It's exciting that there is a model that can just be plugged into a basic harness like Pi, and it just works. I've never seen that before. Most models require lots of configuration and setup. DeepSeek's DeepSeek-V4-Pro is clearly good at agentic coding (probably the best from the open-weight models), but the model is also great on knowledge-intensive tasks where reasoning matters. The agent pulled agentic engineering best practices from different company docs (Anthropic, OpenAI, Google, Stripe, Meta, Modal, DeepSeek, Mistral, Cohere), searched and digested Reddit and HN threads, summarized arxiv papers, and surfaced trending GitHub repos. Then it distilled everything into actionable tips across categories. I love the Wiki it built. The quality is really good. Here is a snapshot of what the wiki looks like: DeepSeek-V4-Pro handled the task without breaking stride. Multi-step research queries, code generation for scaffolding, context-heavy reasoning across disparate sources. For coding specifically, this is the first open-weight model that genuinely feels like a Codex or Claude Code experience. It compares in capability and actual multi-turn agentic work. What made the loop feel so responsive was Fireworks' inference speed (the fastest in the market) and the fact that they actually validate models at the systems level before shipping. No corrupted reasoning traces. Just fast, reliable iteration. The hybrid CSA and HCA attention design cuts KV cache to just 10% and inference FLOPs by nearly 4x at 1M-token context. This is what makes the agent loop actually fast and cheap enough to run in practice. For devs who've been watching open-weight models close the gap but haven't found one that actually delivers in practice, this is the closest I've seen. Try it here:

elvis

59,803 次观看 • 2 个月前

Introducing NEO: The first Autonomous Machine Learning Engineer. It works like a full-stack ML engineer that never sleeps: handling data exploration, feature engineering, training, tuning, deployment, and monitoring, end to end. Powered by 11 specialized agents, NEO runs autonomously, saving ML engineers thousands of hours and making them 10x faster. Benchmarks: Tested on 75 Kaggle competitions, NEO scored a medal in 34.2% of them, significantly outperforming Microsoft’s RD Agent (22.4%) on OpenAI's MLE Bench. This sets a new state of the art for autonomous ML systems. NEO runs on a novel multi-agent orchestrator, powered by a multi-step reasoning engine, context transfer protocol, and agent memory, built to solve complex workflows end to end. And with human-in-the-loop mode, you can guide, inspect, and override any step. You're always in command. NEO is built for real world workflows and ready for production. NEO is here to make every ML engineer truly superhuman. Watch NEO in action:

Introducing NEO: The first Autonomous Machine Learning Engineer. It works like a full-stack ML engineer that never sleeps: handling data exploration, feature engineering, training, tuning, deployment, and monitoring, end to end. Powered by 11 specialized agents, NEO runs autonomously, saving ML engineers thousands of hours and making them 10x faster. Benchmarks: Tested on 75 Kaggle competitions, NEO scored a medal in 34.2% of them, significantly outperforming Microsoft’s RD Agent (22.4%) on OpenAI's MLE Bench. This sets a new state of the art for autonomous ML systems. NEO runs on a novel multi-agent orchestrator, powered by a multi-step reasoning engine, context transfer protocol, and agent memory, built to solve complex workflows end to end. And with human-in-the-loop mode, you can guide, inspect, and override any step. You're always in command. NEO is built for real world workflows and ready for production. NEO is here to make every ML engineer truly superhuman. Watch NEO in action:

Neo AI

859,951 次观看 • 11 个月前

Cognitive scientist Joscha Bach: "at the moment, we are not building systems that are minds — we are building systems that predict text and visual patterns made for human consumption" True machine perception has never been tried, but once it is, it may surpass human reasoning

Cognitive scientist Joscha Bach: "at the moment, we are not building systems that are minds — we are building systems that predict text and visual patterns made for human consumption" True machine perception has never been tried, but once it is, it may surpass human reasoning

Haider.

61,017 次观看 • 21 天前

Computer use agents are slow and brittle. The fix isn’t just stronger models, but also deploying them as multi-agent systems. MACU is a general Multi-Agent Computer Use framework that consistently lifts success rates by 3.4-25.5% and is up to 1.5x faster on long-horizon tasks.🧵

Computer use agents are slow and brittle. The fix isn’t just stronger models, but also deploying them as multi-agent systems. MACU is a general Multi-Agent Computer Use framework that consistently lifts success rates by 3.4-25.5% and is up to 1.5x faster on long-horizon tasks.🧵

Jing Yu Koh

28,562 次观看 • 1 个月前