Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Build agents that can actually do real-world tasks! Agent Reinforcement Trainer (ART) is a framework to train multi-step LLM agents for real-world tasks using GRPO. Just a few lines of code. No manual rewards needed. vLLM + Unsloth combined 🚀 100% open-source.

Akshay 🚀

277,488 subscribers

38,297 Aufrufe • vor 6 Monaten •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

It’s getting insanely easy to build complex multi-agent systems in n8n. You can now build supervisor agents that delegates tasks to specialized sub agents. Just add the sub agents via AI Agent Tool node. All in one place and a lot easier to debug! Zero code!

It’s getting insanely easy to build complex multi-agent systems in n8n. You can now build supervisor agents that delegates tasks to specialized sub agents. Just add the sub agents via AI Agent Tool node. All in one place and a lot easier to debug! Zero code!

elvis

45,690 Aufrufe • vor 1 Jahr

🚨 Agent Swarms - Multi-Agents Delegate Complex Prompts To Sub-Agents Use Gemini 3.5 Flash, Opus 4.7 and GPT 5.5 xHIgh to create complex multi-agent system A master agent can orchestrate several worker agents to just do things Build full-stack apps, mobile apps and automate complex scheduled tasks Each agent can do different tasks including Q/A, monitoring and software development

🚨 Agent Swarms - Multi-Agents Delegate Complex Prompts To Sub-Agents Use Gemini 3.5 Flash, Opus 4.7 and GPT 5.5 xHIgh to create complex multi-agent system A master agent can orchestrate several worker agents to just do things Build full-stack apps, mobile apps and automate complex scheduled tasks Each agent can do different tasks including Q/A, monitoring and software development

Abacus.AI

82,822,706 Aufrufe • vor 2 Monaten

🚨 Breaking: Google just open-sourced the Agent Development Kit (ADK) a framework for building AI agents and multi-agent systems. - Build agents in under 100 lines. - Supports MCP More information and how to get started 👇 1/5

🚨 Breaking: Google just open-sourced the Agent Development Kit (ADK) a framework for building AI agents and multi-agent systems. - Build agents in under 100 lines. - Supports MCP More information and how to get started 👇 1/5

AshutoshShrivastava

130,272 Aufrufe • vor 1 Jahr

1/ Today, VideoDB is launching Director: Open-source framework to build AI video agents that can reason through complex video tasks & instantly stream the results. Video tasks like: 🔍Search ✂️Clipping ✏️Editing 🗣️🔄Dubbing 🎥Generation e.g Here’s a “Highlights agent” 👇

1/ Today, VideoDB is launching Director: Open-source framework to build AI video agents that can reason through complex video tasks & instantly stream the results. Video tasks like: 🔍Search ✂️Clipping ✏️Editing 🗣️🔄Dubbing 🎥Generation e.g Here’s a “Highlights agent” 👇

Anup Gosavi

16,534 Aufrufe • vor 1 Jahr

🤔Can we assess agents across various apps & OS w.o. crafting new envs? OSWorld🖥️: A unified, real computer env for multimodal agents to evaluate open-ended computer tasks with arbitrary apps and interfaces on Ubuntu, Windows, & macOS. + annotated 369 real-world computer tasks 👇

🤔Can we assess agents across various apps & OS w.o. crafting new envs? OSWorld🖥️: A unified, real computer env for multimodal agents to evaluate open-ended computer tasks with arbitrary apps and interfaces on Ubuntu, Windows, & macOS. + annotated 369 real-world computer tasks 👇

Tianbao Xie

66,610 Aufrufe • vor 2 Jahren

Despite constant improvements, AI agents still struggle with complex, multi-step tasks. @Gulp_AI (YC W25) solves this with Osmosis: a framework that helps AI agents learn (similar to DeepSeek R1) in real time. Congrats on the launch Kasey Zhang + Andy!

Despite constant improvements, AI agents still struggle with complex, multi-step tasks. @Gulp_AI (YC W25) solves this with Osmosis: a framework that helps AI agents learn (similar to DeepSeek R1) in real time. Congrats on the launch Kasey Zhang + Andy!

Y Combinator

75,962 Aufrufe • vor 1 Jahr

🚨 BREAKING: First multi-agent world Agents from separate OpenClaw instances can now talk to each other. Local. Remote. Connected. This might be the first real step toward a true multi-agent world. Claw3D City is closer than we think.

🚨 BREAKING: First multi-agent world Agents from separate OpenClaw instances can now talk to each other. Local. Remote. Connected. This might be the first real step toward a true multi-agent world. Claw3D City is closer than we think.

Luke The Dev

37,720 Aufrufe • vor 4 Monaten

From drafting code to fully automating tedious administrative tasks like travel booking and expense reporting, here is a look at the real-world workflows you can start delegating to agents today →

From drafting code to fully automating tedious administrative tasks like travel booking and expense reporting, here is a look at the real-world workflows you can start delegating to agents today →

Google Cloud Tech

13,578 Aufrufe • vor 1 Monat

Simplest way to create an AI agent today: You don't need to write any code. You don't need to build a workflow. You can literally build agents that complete useful tasks for you using a single prompt. Check the quick video I recorded!

Simplest way to create an AI agent today: You don't need to write any code. You don't need to build a workflow. You can literally build agents that complete useful tasks for you using a single prompt. Check the quick video I recorded!

Santiago

62,639 Aufrufe • vor 9 Monaten

New short course: Building Code Agents with Hugging Face smolagents! Learn how to build code agents in this course, created in collaboration with Hugging Face, and taught by Thomas Wolf, its co-founder and CSO, and m_ric, Hugging Face’s Project Lead on Agents. Tool-calling agents use LLMs to generate multiple function calls sequentially to complete a complex sequence of tasks. They generate one function call, execute it, observe, reason, and decide what to do next. Code agents take a different approach. They consolidate all these calls into a single block of code, letting the LLM lay out an entire action plan at once, which can be executed efficiently to provide more reliable results. You’ll learn how to code agents using smolagents, a lightweight agentic framework from Hugging Face. Along the way, you’ll learn how to run LLM-generated code safely and develop an evaluation system to optimize your code agent for production. In detail, you’ll learn: - How agentic systems have evolved, gaining greater levels of agency over time—and why code agents are a next step. - How code agents write their actions in code. - When code agents outperform function-calling agents. - How to run code agents safely in your system using a constrained Python interpreter and sandboxing using E2B. - To trace, debug, and assess the code agent to optimize its behaviours for complex requests. - How to build a research multi-agent system that can find information online and organize it into an interactive report. By the end of this course, you’ll know how to build and run code agents using smolagents, and deploy them safely with a structured evaluation system in your projects. Please sign up here!

New short course: Building Code Agents with Hugging Face smolagents! Learn how to build code agents in this course, created in collaboration with Hugging Face, and taught by Thomas Wolf, its co-founder and CSO, and m_ric, Hugging Face’s Project Lead on Agents. Tool-calling agents use LLMs to generate multiple function calls sequentially to complete a complex sequence of tasks. They generate one function call, execute it, observe, reason, and decide what to do next. Code agents take a different approach. They consolidate all these calls into a single block of code, letting the LLM lay out an entire action plan at once, which can be executed efficiently to provide more reliable results. You’ll learn how to code agents using smolagents, a lightweight agentic framework from Hugging Face. Along the way, you’ll learn how to run LLM-generated code safely and develop an evaluation system to optimize your code agent for production. In detail, you’ll learn: - How agentic systems have evolved, gaining greater levels of agency over time—and why code agents are a next step. - How code agents write their actions in code. - When code agents outperform function-calling agents. - How to run code agents safely in your system using a constrained Python interpreter and sandboxing using E2B. - To trace, debug, and assess the code agent to optimize its behaviours for complex requests. - How to build a research multi-agent system that can find information online and organize it into an interactive report. By the end of this course, you’ll know how to build and run code agents using smolagents, and deploy them safely with a structured evaluation system in your projects. Please sign up here!

Andrew Ng

127,724 Aufrufe • vor 1 Jahr

OpenAI Operator looks great here is a open source version of it you can use now see an agent that uses your browser to perform tasks for you developers get started in a few lines of code pip install 'ai-gradio[browser]' import gradio as gr import ai_gradio gr.load( name='browser:gpt-4o', src=ai_gradio.registry, title='AI Browser Agent', description='Agent that helps with web tasks' ).launch()

OpenAI Operator looks great here is a open source version of it you can use now see an agent that uses your browser to perform tasks for you developers get started in a few lines of code pip install 'ai-gradio[browser]' import gradio as gr import ai_gradio gr.load( name='browser:gpt-4o', src=ai_gradio.registry, title='AI Browser Agent', description='Agent that helps with web tasks' ).launch()

AK

98,797 Aufrufe • vor 1 Jahr

Big thanks to AK for highlighting our work! LEO marks our pioneering step towards building an embodied generalist agent that can really comprehend the 3D world! 🚀Leveraging LLMs, we train LEO with real and synthetic 3D data across a diverse spectrum of tasks. It's thrilling to see LEO surpass current state-of-the-art SOTA methods in most benchmarked tasks, all under a single, unified model. 🔥 #Generalist_Agent

Big thanks to AK for highlighting our work! LEO marks our pioneering step towards building an embodied generalist agent that can really comprehend the 3D world! 🚀Leveraging LLMs, we train LEO with real and synthetic 3D data across a diverse spectrum of tasks. It's thrilling to see LEO surpass current state-of-the-art SOTA methods in most benchmarked tasks, all under a single, unified model. 🔥 #Generalist_Agent

Siyuan Huang

22,710 Aufrufe • vor 2 Jahren

🤯 Hermes agent just got a Kanban task board, and it changes everything. Watch multiple AI agents collaborate in real time, each with a defined role, completing tasks with way more depth than your typical sub-agents. This is the future of multi-agent workflows 👇

🤯 Hermes agent just got a Kanban task board, and it changes everything. Watch multiple AI agents collaborate in real time, each with a defined role, completing tasks with way more depth than your typical sub-agents. This is the future of multi-agent workflows 👇

Boxmining

12,794 Aufrufe • vor 2 Monaten

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

New Course: Reinforcement Fine-Tuning LLMs with GRPO! Learn to use reinforcement learning to improve your LLM performance in this short course, built in collaboration with Predibase by Rubrik, and taught by Travis Addair, its Co-Founder and CTO, and Arnav Garg, its Senior Engineer and Machine Learning Lead. Reasoning models have been one of the most important developments in LLMs. Reinforcement Fine-Tuning (RFT) uses rewards to encourage LLMs to find solutions to multi-step reasoning tasks such as solving math problems and debugging code - without needing pre-existing training examples like in traditional supervised fine-tuning. Group Relative Policy Optimization (GRPO) is a reinforcement fine-tuning algorithm gaining rapid adoption. Developed by the DeepSeek team and used to train the R1 reasoning model, GRPO uses reward functions that you can write in Python to assign rewards to model responses. It’s beneficial for tasks with verifiable outcomes and can work well even with fewer than 100 training examples. It can also significantly improve the reasoning ability of smaller LLMs, making applications faster and more cost effective. In this course, you’ll take a technical deep dive into RFT with GRPO. You’ll learn to build reward functions that you can use in the GRPO training process to guide an LLM toward better performance on multi-step reasoning tasks. In detail, you’ll: - Learn when reinforcement fine-tuning is a better fit than supervised fine-tuning, especially for tasks involving multi-step reasoning or limited labeled data. - Understand how GRPO uses programmable reward functions as a more scalable alternative to the human feedback required for other reinforcement learning algorithms, such as RLHF and DPO. - Frame the Wordle game as a reinforcement fine-tuning problem and see how an LLM can learn to plan, analyze feedback, and improve its strategy over time. - Design reward functions that power the reinforcement fine-tuning process. - Learn techniques for evaluating more subjective tasks, such as rating the quality of a text summary, using an LLM as a judge. - Understand why reward hacking happens and how to avoid it by adding penalty functions to discourage undesirable behaviors. - Learn the four key components of the loss calculation in the GRPO algorithm: token probability distribution ratios, advantages, clipping, and KL-divergence. - Launch reinforcement fine-tuning jobs using Predibase’s hosted training services. By the end of this course, you’ll be able to build and fine-tune LLMs using reinforcement learning to improve reasoning without relying on large labeled datasets or subjective human feedback. Please sign up here:

Andrew Ng

86,457 Aufrufe • vor 1 Jahr

NO WAY THIS IS ACTUALLY FREE AND OPEN-SOURCE.. PilotDeck just dropped an open-source agent OS with persistent workspaces, autonomous execution, and intelligent model routing. It routes hard tasks to stronger models and simple tasks to cheaper agents, so you stop burning premium tokens on work that doesn’t need them. Tutorial below:

NO WAY THIS IS ACTUALLY FREE AND OPEN-SOURCE.. PilotDeck just dropped an open-source agent OS with persistent workspaces, autonomous execution, and intelligent model routing. It routes hard tasks to stronger models and simple tasks to cheaper agents, so you stop burning premium tokens on work that doesn’t need them. Tutorial below:

Robin Delta

39,717 Aufrufe • vor 2 Monaten

🚨 AI Can Build Complex Apps With Just One Single Prompt New AI models may be banned but multi-LLM agents can build insane things.... Here is our agent building an entire Slack-like app using a single prompt! Combines Opus, GPT 5.5 and open-source LLMs to optimize cost and performance

🚨 AI Can Build Complex Apps With Just One Single Prompt New AI models may be banned but multi-LLM agents can build insane things.... Here is our agent building an entire Slack-like app using a single prompt! Combines Opus, GPT 5.5 and open-source LLMs to optimize cost and performance

Bindu Reddy

6,663,574 Aufrufe • vor 1 Monat

Humanoid Robots can Think, Move, and Adapt in Real Time. LimX COSA (Cognitive OS of Agents), the first physical-world-native agentic OS, unifies cognitive reasoning and whole-body control, enabling robots to complete long-horizon, multi-step tasks.

Humanoid Robots can Think, Move, and Adapt in Real Time. LimX COSA (Cognitive OS of Agents), the first physical-world-native agentic OS, unifies cognitive reasoning and whole-body control, enabling robots to complete long-horizon, multi-step tasks.

LimX Dynamics

1,631,725 Aufrufe • vor 6 Monaten

Announcing Builder 2.0 We raised $67M to build collaborative coding for Claude and Codex - Start tasks from a local branch, Slack or Jira - Real-time collab between humans and agents - 100s of parallel agents code, test, review Reply "Builder" and I'll DM you 500 agent credits

Announcing Builder 2.0 We raised $67M to build collaborative coding for Claude and Codex - Start tasks from a local branch, Slack or Jira - Real-time collab between humans and agents - 100s of parallel agents code, test, review Reply "Builder" and I'll DM you 500 agent credits

Steve (Builder.io)

90,018,640 Aufrufe • vor 3 Monaten

Weekends are for research. Inside OptimAI labs, we’ve been pushing agents beyond the screen. A first glimpse of OptimAI Prime. Not just agents that think, but agents that operate. Learning through reinforcement, coordinating across the network, and executing in real-world environments. From digital workflows → to physical actions. This is the next step for the agent-native internet.🤖 Still early. But already in motion.

Weekends are for research. Inside OptimAI labs, we’ve been pushing agents beyond the screen. A first glimpse of OptimAI Prime. Not just agents that think, but agents that operate. Learning through reinforcement, coordinating across the network, and executing in real-world environments. From digital workflows → to physical actions. This is the next step for the agent-native internet.🤖 Still early. But already in motion.

OptimAI Network

20,674 Aufrufe • vor 2 Monaten