Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

As AI agents face increasingly long and complex tasks, decomposing them into subtasks becomes increasingly appealing. But how do we discover such temporal structure? Hierarchical RL provides a natural formalism-yet many questions remain open. Here's our overview of the field🧵

Martin Klissarov

2,826 subscribers

36,008 просмотров • 11 месяцев назад •via X (Twitter)

Наука и технологии Игры Образование

Anya Rossi• Live Now

Private livecam show

Комментарии: 11

Фото профиля Martin Klissarov

Martin Klissarov11 месяцев назад

Humans constantly leverage temporal structure: we actuate muscles each millisecond, yet our plans can span days, months and even years. Computers are built on this same principle. How will AI agents discover and use such structure? What is "good" structure in the first place?

Фото профиля Martin Klissarov

Martin Klissarov11 месяцев назад

In this 80+ pages manuscript, we cover the rich, diverse and many-decades old literature studying temporal structure discovery in AI. When and in what way should we expect these methods to benefit agents? What are the trade-offs involved?

Фото профиля Martin Klissarov

Martin Klissarov11 месяцев назад

We cover methods that learn: (1) directly from experience, (2) through offline datasets and (3) with foundation models (LLMs). We present each methods through the fundamental challenges of decision making, namely: (a) exploration (b) credit assignment and (c) transferability

Фото профиля Martin Klissarov

Martin Klissarov11 месяцев назад

We often get bogged down by differences in formalisms (goal-direction RL, options, feudal RL, skills …) -- we unite these core ideas through a single perspective. We believe hierarchical RL is fundamentally about the algorithm through which we discover temporal structure.

Фото профиля Martin Klissarov

Martin Klissarov11 месяцев назад

We hope this work provides a good introduction to the field. Finding temporal structure is challenging. As such, we carefully laid down some of the most pressing questions in the field. We also identified domains that are particularly promising, e.g. open-ended systems.

Фото профиля Martin Klissarov

Martin Klissarov11 месяцев назад

This work was done over the course of many friendly virtual calls with @akhil_bagaria and @RayZiyan41307, and under the thoughtful guidance of researchers that have spent decades working on these problems, namely George Konidaris, Doina Precup and @MarlosCMachado

Фото профиля Martin Klissarov

Martin Klissarov11 месяцев назад

We are looking to continue to improve this manuscript, please share your feedback!

Фото профиля Arsen Ibragimov

Arsen Ibragimov11 месяцев назад

Always been fascinated by how HRL tackles the problem of breaking complex tasks into manageable steps. The fields huge potential imo, but yeah, still feels like we’re just scratching the surface of what’s possible

Фото профиля Abhranil Chandra

Abhranil Chandra11 месяцев назад

Very interesting work @MartinKlissarov !!!

Фото профиля harsh satija

harsh satija11 месяцев назад

Great work!! Thanks for the much needed unified overview - looking forward to reading it.

Фото профиля Martin Klissarov

Martin Klissarov11 месяцев назад

Thanks for the kind words Harsh!

Похожие видео

Super excited to share 🧠MLGym 🦾 – the first Gym environment for AI Research Agents 🤖🔬 We introduce MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing LLM agents on AI research tasks. The key contributions of our work are: 🕹️ Enables the exploration of different training algorithms for AI Research Agents such as RL 🛠️ Provides a flexible evaluation framework that can accommodate different artifacts such as models, algorithms, or predictions 🤖 Allows researchers to evaluate any model without the need to develop a custom agentic harness 🎯 Introduces 13 diverse open-ended AI Research tasks for evaluating AI Research Agents on a wide range of domains such as computer vision, natural language processing, reinforcement learning, game theory, and logical reasoning. 📈 Proposes a new evaluation metric for AI Research Agents MLGym makes it easy to: 1) Add new tasks 2) Evaluate new models 3) Integrate new agents Check out a video of the MLGym Agent to see how it performs the full pipeline of idea generation💡, implementation 👩‍💻, experimentation 👩‍🔬, and iteration 🔄 to improve on ML tasks. Huge thanks to the exceptionally talented Deepak Nathani who led this work and to all the other amazing collaborators who made this possible 🙏🫶🚀

Super excited to share 🧠MLGym 🦾 – the first Gym environment for AI Research Agents 🤖🔬 We introduce MLGym and MLGym-Bench, a new framework and benchmark for evaluating and developing LLM agents on AI research tasks. The key contributions of our work are: 🕹️ Enables the exploration of different training algorithms for AI Research Agents such as RL 🛠️ Provides a flexible evaluation framework that can accommodate different artifacts such as models, algorithms, or predictions 🤖 Allows researchers to evaluate any model without the need to develop a custom agentic harness 🎯 Introduces 13 diverse open-ended AI Research tasks for evaluating AI Research Agents on a wide range of domains such as computer vision, natural language processing, reinforcement learning, game theory, and logical reasoning. 📈 Proposes a new evaluation metric for AI Research Agents MLGym makes it easy to: 1) Add new tasks 2) Evaluate new models 3) Integrate new agents Check out a video of the MLGym Agent to see how it performs the full pipeline of idea generation💡, implementation 👩‍💻, experimentation 👩‍🔬, and iteration 🔄 to improve on ML tasks. Huge thanks to the exceptionally talented Deepak Nathani who led this work and to all the other amazing collaborators who made this possible 🙏🫶🚀

Roberta Raileanu

104,952 просмотров • 1 год назад

Can AI agents adapt zero-shot, to complex multi-step language instructions in open-ended environments? We present MaestroMotif, a method for AI-assisted skill design that produces highly capable and steerable hierarchical agents. To the best of our knowledge, it is the first method that, without expert labeled datasets, solves compositional tasks requiring hundreds of steps for completion. All the modules within MaestroMotif are learned from interaction: from the highest level of planning to the lowest-level of sensorimotor control. On the open-ended domain of NetHack, it surpasses existing approaches, including those that are fine-tuned specifically for each task. At the heart of MaestroMotif is the idea that decomposing a task into subtasks significantly helps decision making. MaestroMotif leverages an agent designer's intuition about a domain to identify important skills and describe them in natural language. These short descriptions then get converted into adaptable hierarchical agents through AI feedback and in-context learning. Our paper was recently published at ICLR 2025 and we open-source the whole project including the code, prompts and pre-trained models. Paper: Code: NotebookLM Podcast: This work was done with the amazing Mikael Henaff, Roberta Raileanu, Shagun Sodhani, Pascal Vincent, Amy Zhang, Pierre-Luc Bacon, Doina Precup, with equal supervision by Marlos C. Machado and Pierluca D'Oro. Take a look at the following thread:

Can AI agents adapt zero-shot, to complex multi-step language instructions in open-ended environments? We present MaestroMotif, a method for AI-assisted skill design that produces highly capable and steerable hierarchical agents. To the best of our knowledge, it is the first method that, without expert labeled datasets, solves compositional tasks requiring hundreds of steps for completion. All the modules within MaestroMotif are learned from interaction: from the highest level of planning to the lowest-level of sensorimotor control. On the open-ended domain of NetHack, it surpasses existing approaches, including those that are fine-tuned specifically for each task. At the heart of MaestroMotif is the idea that decomposing a task into subtasks significantly helps decision making. MaestroMotif leverages an agent designer's intuition about a domain to identify important skills and describe them in natural language. These short descriptions then get converted into adaptable hierarchical agents through AI feedback and in-context learning. Our paper was recently published at ICLR 2025 and we open-source the whole project including the code, prompts and pre-trained models. Paper: Code: NotebookLM Podcast: This work was done with the amazing Mikael Henaff, Roberta Raileanu, Shagun Sodhani, Pascal Vincent, Amy Zhang, Pierre-Luc Bacon, Doina Precup, with equal supervision by Marlos C. Machado and Pierluca D'Oro. Take a look at the following thread:

Martin Klissarov

80,217 просмотров • 1 год назад

Hindu homes have been torched in Dumritala, Bangladesh. How long are we expected to tolerate the genocide of Hindus? As much as it is increasingly becoming necessary to “turn both chicken necks and join them with the elephant,” we must proceed with extreme caution and ensure we do not fall into a trap.

Hindu homes have been torched in Dumritala, Bangladesh. How long are we expected to tolerate the genocide of Hindus? As much as it is increasingly becoming necessary to “turn both chicken necks and join them with the elephant,” we must proceed with extreme caution and ensure we do not fall into a trap.

Shesh Paul Vaid

11,361 просмотров • 5 месяцев назад

AI agents are increasingly acting on behalf of real humans, but how do you separate the honest actors from the scammers? kirill avery , founder of Alien , joins brett gibson on High Bit to discuss how we can “prove you’re human” in an AI world.

AI agents are increasingly acting on behalf of real humans, but how do you separate the honest actors from the scammers? kirill avery , founder of Alien , joins brett gibson on High Bit to discuss how we can “prove you’re human” in an AI world.

Initialized Capital

12,472 просмотров • 1 месяц назад

📁 Emad Mostaque, founder of Stability AI, says we’re entering “AI Atlantis,” a world where anyone can access trillions of AI agents that perform complex tasks for almost nothing, even creating other agents. In this future, it won’t matter how many people are on your team, but how many agents you can orchestrate.

📁 Emad Mostaque, founder of Stability AI, says we’re entering “AI Atlantis,” a world where anyone can access trillions of AI agents that perform complex tasks for almost nothing, even creating other agents. In this future, it won’t matter how many people are on your team, but how many agents you can orchestrate.

Jon Hernandez

29,575 просмотров • 10 месяцев назад

The threats our nation faces are increasingly complex and global-the FBI needs people willing to help us investigate them - whether in the communities where we live or around the world. Discover your future with the #FBI at

The threats our nation faces are increasingly complex and global-the FBI needs people willing to help us investigate them - whether in the communities where we live or around the world. Discover your future with the #FBI at

FBI El Paso

22,676 просмотров • 2 месяцев назад

In our latest episode of Tech Talks, we explore the future of agentic game creation at Roblox. Discover how Roblox Assistant is evolving beyond a prompt tool into an AI-native system that can plan, execute, and verify complex game development tasks alongside creators.

In our latest episode of Tech Talks, we explore the future of agentic game creation at Roblox. Discover how Roblox Assistant is evolving beyond a prompt tool into an AI-native system that can plan, execute, and verify complex game development tasks alongside creators.

Roblox

84,820 просмотров • 1 месяц назад

Get your AI agents to collaborate. The Agent2Agent (A2A) protocol is a new open standard that helps agents interoperate and break up complex tasks across specialists.

Get your AI agents to collaborate. The Agent2Agent (A2A) protocol is a new open standard that helps agents interoperate and break up complex tasks across specialists.

Google for Developers

14,932 просмотров • 7 месяцев назад

Strong in the face of any challenge 🛡️ Learn how NATO responds to complex crises such as natural disasters and humanitarian emergencies ↓

Strong in the face of any challenge 🛡️ Learn how NATO responds to complex crises such as natural disasters and humanitarian emergencies ↓

NATO

37,055 просмотров • 9 месяцев назад

The threats our nation faces are increasingly complex and global. Are you up for the challenge? Forge your path with the FBI. We employ thousands of special agents who share a common mission to protect Americans. Your future awaits at

The threats our nation faces are increasingly complex and global. Are you up for the challenge? Forge your path with the FBI. We employ thousands of special agents who share a common mission to protect Americans. Your future awaits at

FBI Baltimore

39,997 просмотров • 2 месяцев назад

Long-running AI agents can run tasks over hours, days or weeks. Here's a few of my thoughts on them from Google Cloud's Agent Factory.

Long-running AI agents can run tasks over hours, days or weeks. Here's a few of my thoughts on them from Google Cloud's Agent Factory.

Addy Osmani

37,843 просмотров • 12 дней назад

When someone like Graham Hancock asks too many questions, the system moves to shut them down, not because the questions lack merit but because the narrative can't survive them. It’s a house of cards built over the last hundred years to control how we understand history, and our species only holds together as long as no one pushes too hard. The moment real questions get asked, the whole thing collapses and an alternate reality becomes visible.

When someone like Graham Hancock asks too many questions, the system moves to shut them down, not because the questions lack merit but because the narrative can't survive them. It’s a house of cards built over the last hundred years to control how we understand history, and our species only holds together as long as no one pushes too hard. The moment real questions get asked, the whole thing collapses and an alternate reality becomes visible.

Randall Carlson

99,616 просмотров • 4 месяцев назад

How to chain multiple dexterous skills to tackle complex long-horizon manipulation tasks? Imagine retrieving a LEGO block from a pile, rotating it in-hand, and inserting it at the desired location to build a structure. Introducing our new work - Sequential Dexterity 🧵👇

How to chain multiple dexterous skills to tackle complex long-horizon manipulation tasks? Imagine retrieving a LEGO block from a pile, rotating it in-hand, and inserting it at the desired location to build a structure. Introducing our new work - Sequential Dexterity 🧵👇

Chen Wang

159,684 просмотров • 2 лет назад

How well do today’s frontier models handle long-horizon, multi-step web agent tasks, such as identifying the top 25 U.S. CS PhD programs with ML/AI faculty likely accepting students and compiling the results into a structured sheet? Check out our new work on Odysseys: Benchmarking Web Agents on Realistic Long Horizon Tasks Paper: Leaderboard: We introduce Odysseys, a benchmark of 200 long-horizon tasks derived from real browsing sessions and evaluated on the live Internet. We show that binary pass/fail is inadequate in this setting and propose rubric-based evaluation, which better aligns with human judgment and provides more informative signals. Across leading models, the best achieves only 44.5% success, leaving substantial headroom. We further introduce a Trajectory Efficiency metric (rubric score per step) and find efficiency remains extremely low (1.15%), highlighting a key bottleneck. Odysseys provides a realistic benchmark for measuring progress toward web agents capable of sustained, efficient, real-world operation. See a more detailed thread by Jing Yu Koh.

How well do today’s frontier models handle long-horizon, multi-step web agent tasks, such as identifying the top 25 U.S. CS PhD programs with ML/AI faculty likely accepting students and compiling the results into a structured sheet? Check out our new work on Odysseys: Benchmarking Web Agents on Realistic Long Horizon Tasks Paper: Leaderboard: We introduce Odysseys, a benchmark of 200 long-horizon tasks derived from real browsing sessions and evaluated on the live Internet. We show that binary pass/fail is inadequate in this setting and propose rubric-based evaluation, which better aligns with human judgment and provides more informative signals. Across leading models, the best achieves only 44.5% success, leaving substantial headroom. We further introduce a Trajectory Efficiency metric (rubric score per step) and find efficiency remains extremely low (1.15%), highlighting a key bottleneck. Odysseys provides a realistic benchmark for measuring progress toward web agents capable of sustained, efficient, real-world operation. See a more detailed thread by Jing Yu Koh.

Russ Salakhutdinov

22,404 просмотров • 1 месяц назад

Introducing Pangram 3.0, a new state-of-the-art for AI detection and transparency. This update introduces what we call AI assistance detection. We recognize that authorship is increasingly mixed, and we want to ensure that AI-assisted writing is correctly categorized as such.

Introducing Pangram 3.0, a new state-of-the-art for AI detection and transparency. This update introduces what we call AI assistance detection. We recognize that authorship is increasingly mixed, and we want to ensure that AI-assisted writing is correctly categorized as such.

Pangram

29,260 просмотров • 6 месяцев назад

Do AI agents ask good questions? We built “Collaborative Battleship” to find out—and discovered that weaker LMs + Bayesian inference can beat GPT-5 at 1% of the cost. Paper, code & demos: Here's what we learned about building rational information-seeking agents... 🧵🔽

Do AI agents ask good questions? We built “Collaborative Battleship” to find out—and discovered that weaker LMs + Bayesian inference can beat GPT-5 at 1% of the cost. Paper, code & demos: Here's what we learned about building rational information-seeking agents... 🧵🔽

Gabe Grand

48,374 просмотров • 7 месяцев назад

Interestingly, as we have AI agents that run in the background, the speed of AI becomes incrementally less important than the core underlying capability level. When you could only give AI small bits of work, then the speed of response mattered a ton. The rate at which you can go back and forth with AI in real-time was the determining factor on how useful it is. But as AI agents can perform more complex and useful tasks in parallel behind the scenes, you can much more easily afford to wait longer for work to get done assuming it’s valuable. Now the focus becomes more on how you review and orchestrate the agents’ work. And the main factor is just how useful and usable is the output that the agent came back with.

Interestingly, as we have AI agents that run in the background, the speed of AI becomes incrementally less important than the core underlying capability level. When you could only give AI small bits of work, then the speed of response mattered a ton. The rate at which you can go back and forth with AI in real-time was the determining factor on how useful it is. But as AI agents can perform more complex and useful tasks in parallel behind the scenes, you can much more easily afford to wait longer for work to get done assuming it’s valuable. Now the focus becomes more on how you review and orchestrate the agents’ work. And the main factor is just how useful and usable is the output that the agent came back with.

Aaron Levie

103,811 просмотров • 9 месяцев назад