Загрузка видео...

Не удалось загрузить видео

На главную

As AI agents face increasingly long and complex tasks, decomposing them into subtasks becomes increasingly appealing. But how do we discover such temporal structure? Hierarchical RL provides a natural formalism-yet many questions remain open. Here's our overview of the field🧵

36,008 просмотров • 11 месяцев назад •via X (Twitter)

Комментарии: 11

Фото профиля Martin Klissarov
Martin Klissarov11 месяцев назад

Humans constantly leverage temporal structure: we actuate muscles each millisecond, yet our plans can span days, months and even years. Computers are built on this same principle. How will AI agents discover and use such structure? What is "good" structure in the first place?

Фото профиля Martin Klissarov
Martin Klissarov11 месяцев назад

In this 80+ pages manuscript, we cover the rich, diverse and many-decades old literature studying temporal structure discovery in AI. When and in what way should we expect these methods to benefit agents? What are the trade-offs involved?

Фото профиля Martin Klissarov
Martin Klissarov11 месяцев назад

We cover methods that learn: (1) directly from experience, (2) through offline datasets and (3) with foundation models (LLMs). We present each methods through the fundamental challenges of decision making, namely: (a) exploration (b) credit assignment and (c) transferability

Фото профиля Martin Klissarov
Martin Klissarov11 месяцев назад

We often get bogged down by differences in formalisms (goal-direction RL, options, feudal RL, skills …) -- we unite these core ideas through a single perspective. We believe hierarchical RL is fundamentally about the algorithm through which we discover temporal structure.

Фото профиля Martin Klissarov
Martin Klissarov11 месяцев назад

We hope this work provides a good introduction to the field. Finding temporal structure is challenging. As such, we carefully laid down some of the most pressing questions in the field. We also identified domains that are particularly promising, e.g. open-ended systems.

Фото профиля Martin Klissarov
Martin Klissarov11 месяцев назад

This work was done over the course of many friendly virtual calls with @akhil_bagaria and @RayZiyan41307, and under the thoughtful guidance of researchers that have spent decades working on these problems, namely George Konidaris, Doina Precup and @MarlosCMachado

Фото профиля Martin Klissarov
Martin Klissarov11 месяцев назад

We are looking to continue to improve this manuscript, please share your feedback!

Фото профиля Arsen Ibragimov
Arsen Ibragimov11 месяцев назад

Always been fascinated by how HRL tackles the problem of breaking complex tasks into manageable steps. The fields huge potential imo, but yeah, still feels like we’re just scratching the surface of what’s possible

Фото профиля Abhranil Chandra
Abhranil Chandra11 месяцев назад

Very interesting work @MartinKlissarov !!!

Фото профиля harsh satija
harsh satija11 месяцев назад

Great work!! Thanks for the much needed unified overview - looking forward to reading it.

Фото профиля Martin Klissarov
Martin Klissarov11 месяцев назад

Thanks for the kind words Harsh!

Похожие видео

Can AI agents adapt zero-shot, to complex multi-step language instructions in open-ended environments? We present MaestroMotif, a method for AI-assisted skill design that produces highly capable and steerable hierarchical agents. To the best of our knowledge, it is the first method that, without expert labeled datasets, solves compositional tasks requiring hundreds of steps for completion. All the modules within MaestroMotif are learned from interaction: from the highest level of planning to the lowest-level of sensorimotor control. On the open-ended domain of NetHack, it surpasses existing approaches, including those that are fine-tuned specifically for each task. At the heart of MaestroMotif is the idea that decomposing a task into subtasks significantly helps decision making. MaestroMotif leverages an agent designer's intuition about a domain to identify important skills and describe them in natural language. These short descriptions then get converted into adaptable hierarchical agents through AI feedback and in-context learning. Our paper was recently published at ICLR 2025 and we open-source the whole project including the code, prompts and pre-trained models. Paper: Code: NotebookLM Podcast: This work was done with the amazing Mikael Henaff, Roberta Raileanu, Shagun Sodhani, Pascal Vincent, Amy Zhang, Pierre-Luc Bacon, Doina Precup, with equal supervision by Marlos C. Machado and Pierluca D'Oro. Take a look at the following thread:

Martin Klissarov

80,217 просмотров • 1 год назад