正在加载视频...

视频加载失败

加载此视频时出现问题。这可能是由于临时网络问题，或视频可能不可用。

Introducing Scalable Option Learning (SOL☀️), a blazingly fast hierarchical RL algorithm that makes progress on long-horizon tasks and demonstrates positive scaling trends on the largely unsolved NetHack benchmark, when trained for 30 billion samples. Details, paper and code in >

Mikael Henaff

1,907 subscribers

20,957 次观看 • 8 个月前 •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 条评论

暂无评论

原始帖子的评论将显示在这里

相关视频

Introducing Adjoint Sampling, a new learning algorithm that trains generative models based on scalar rewards. Based on theoretical foundations developed by FAIR, Adjoint Sampling leads to a highly scalable practical algorithm, and can become the foundation for further research into highly scalable sampling methods. Read our research paper on Adjoint Sampling and download the model, code, and benchmark ➡️

Introducing Adjoint Sampling, a new learning algorithm that trains generative models based on scalar rewards. Based on theoretical foundations developed by FAIR, Adjoint Sampling leads to a highly scalable practical algorithm, and can become the foundation for further research into highly scalable sampling methods. Read our research paper on Adjoint Sampling and download the model, code, and benchmark ➡️

AI at Meta

36,987 次观看 • 1 年前

LLMs are capable of high-level planning, but they require pre-trained skills! Our #ICLR2024 paper instead uses LLM guidance to train RL agents from scratch to solve 25+ long-horizon robotics tasks across four benchmarks w/ >85% success rates Paper & code:

LLMs are capable of high-level planning, but they require pre-trained skills! Our #ICLR2024 paper instead uses LLM guidance to train RL agents from scratch to solve 25+ long-horizon robotics tasks across four benchmarks w/ >85% success rates Paper & code:

Murtaza Dalal

36,257 次观看 • 2 年前

Scaling laws in deep RL? Turns out that batch size, learning rate, and UTD (update-to-data) for getting the most efficient and scalable deep RL has predictable relationships. Checkout the analysis in new work by Oleg Rybkin & collaborators:

Scaling laws in deep RL? Turns out that batch size, learning rate, and UTD (update-to-data) for getting the most efficient and scalable deep RL has predictable relationships. Checkout the analysis in new work by Oleg Rybkin & collaborators:

Sergey Levine

43,464 次观看 • 1 年前

Announcing ManiSkill2: a unified, fast, and accessible benchmark - robot learning made simple! - ✅Pip installable & easily deployable - ⚡Blazingly fast visual RL support - ✨Diverse task families, objects, & 4M demonstration frames - 🖥️Interactive GUI

Announcing ManiSkill2: a unified, fast, and accessible benchmark - robot learning made simple! - ✅Pip installable & easily deployable - ⚡Blazingly fast visual RL support - ✨Diverse task families, objects, & 4M demonstration frames - 🖥️Interactive GUI

Hao Su Lab

42,096 次观看 • 3 年前

Today, we present a step-change in robotic AI Sunday. Introducing ACT-1: A frontier robot foundation model trained on zero robot data. - Ultra long-horizon tasks - Zero-shot generalization - Advanced dexterity 🧵->

Today, we present a step-change in robotic AI Sunday. Introducing ACT-1: A frontier robot foundation model trained on zero robot data. - Ultra long-horizon tasks - Zero-shot generalization - Advanced dexterity 🧵->

Tony Zhao

2,042,723 次观看 • 6 个月前

How to teach robots to perform long-horizon tasks efficiently and robustly🦾? Introducing MimicPlay - an imitation learning algorithm that uses "cheap human play data". Our approach unlocks both real-time planning through raw perception and strong robustness to disturbances!🧵👇

How to teach robots to perform long-horizon tasks efficiently and robustly🦾? Introducing MimicPlay - an imitation learning algorithm that uses "cheap human play data". Our approach unlocks both real-time planning through raw perception and strong robustness to disturbances!🧵👇

Chen Wang

288,500 次观看 • 3 年前

Want to scale RL with your shiny new GPU? 🚀 In our ICML24 Oral we find that RL algorithms hit a barrier when data is scaled up. Our new algorithm, SAPG, proposes a simple fix. It scales to 25k envs and solves hard tasks where PPO makes no progress. 1/n

Want to scale RL with your shiny new GPU? 🚀 In our ICML24 Oral we find that RL algorithms hit a barrier when data is scaled up. Our new algorithm, SAPG, proposes a simple fix. It scales to 25k envs and solves hard tasks where PPO makes no progress. 1/n

Ananye Agarwal

64,403 次观看 • 1 年前

Introducing HiLMa-Res: a hierarchical RL framework for quadrupeds to tackle loco-manipulation tasks with sustained mobility! Designed for general learning tasks (vision-based, state-based, real-world data, etc), the robot now can step over stones🐾/navigate boxes📦/dribble⚽.

Introducing HiLMa-Res: a hierarchical RL framework for quadrupeds to tackle loco-manipulation tasks with sustained mobility! Designed for general learning tasks (vision-based, state-based, real-world data, etc), the robot now can step over stones🐾/navigate boxes📦/dribble⚽.

Zhongyu Li

11,720 次观看 • 1 年前

Luminal ( is creating PyTorch for Production – an ML compiler that generates blazingly fast CUDA kernels and makes deploying to production one line of code. Congrats on the launch, Jake Stevens, Joe Fioti, and Matthew Gunton!

Luminal ( is creating PyTorch for Production – an ML compiler that generates blazingly fast CUDA kernels and makes deploying to production one line of code. Congrats on the launch, Jake Stevens, Joe Fioti, and Matthew Gunton!

Y Combinator

98,496 次观看 • 10 个月前

Additionally, looking towards the future, we’re releasing PARTNR: a benchmark for Planning And Reasoning Tasks in humaN-Robot collaboration. Built on Habitat 3.0, it’s the largest benchmark of its kind to study and evaluate human-robot collaboration in household activities By providing a standardized benchmark and dataset we hope to enable new research on robots that can not only operate in isolation, but in collaboration with people. Details and code ➡️

Additionally, looking towards the future, we’re releasing PARTNR: a benchmark for Planning And Reasoning Tasks in humaN-Robot collaboration. Built on Habitat 3.0, it’s the largest benchmark of its kind to study and evaluate human-robot collaboration in household activities By providing a standardized benchmark and dataset we hope to enable new research on robots that can not only operate in isolation, but in collaboration with people. Details and code ➡️

AI at Meta

20,244 次观看 • 1 年前

How well do today’s frontier models handle long-horizon, multi-step web agent tasks, such as identifying the top 25 U.S. CS PhD programs with ML/AI faculty likely accepting students and compiling the results into a structured sheet? Check out our new work on Odysseys: Benchmarking Web Agents on Realistic Long Horizon Tasks Paper: Leaderboard: We introduce Odysseys, a benchmark of 200 long-horizon tasks derived from real browsing sessions and evaluated on the live Internet. We show that binary pass/fail is inadequate in this setting and propose rubric-based evaluation, which better aligns with human judgment and provides more informative signals. Across leading models, the best achieves only 44.5% success, leaving substantial headroom. We further introduce a Trajectory Efficiency metric (rubric score per step) and find efficiency remains extremely low (1.15%), highlighting a key bottleneck. Odysseys provides a realistic benchmark for measuring progress toward web agents capable of sustained, efficient, real-world operation. See a more detailed thread by Jing Yu Koh.

How well do today’s frontier models handle long-horizon, multi-step web agent tasks, such as identifying the top 25 U.S. CS PhD programs with ML/AI faculty likely accepting students and compiling the results into a structured sheet? Check out our new work on Odysseys: Benchmarking Web Agents on Realistic Long Horizon Tasks Paper: Leaderboard: We introduce Odysseys, a benchmark of 200 long-horizon tasks derived from real browsing sessions and evaluated on the live Internet. We show that binary pass/fail is inadequate in this setting and propose rubric-based evaluation, which better aligns with human judgment and provides more informative signals. Across leading models, the best achieves only 44.5% success, leaving substantial headroom. We further introduce a Trajectory Efficiency metric (rubric score per step) and find efficiency remains extremely low (1.15%), highlighting a key bottleneck. Odysseys provides a realistic benchmark for measuring progress toward web agents capable of sustained, efficient, real-world operation. See a more detailed thread by Jing Yu Koh.

Russ Salakhutdinov

22,404 次观看 • 1 个月前

Eric Jang weighs in on RL vs. supervised learning for humanoid robot manipulation tasks.

Eric Jang weighs in on RL vs. supervised learning for humanoid robot manipulation tasks.

The Humanoid Hub

10,656 次观看 • 10 个月前

NeurIPS 2025 Paper: LLMs are Reinforcement Learners 🤯! Surprisingly, we show that LLMs can solve RL tasks without any external component! We introduce Prompted Policy Search (ProPS), an RL method based only LLMs and in-context learning. [Paper]

NeurIPS 2025 Paper: LLMs are Reinforcement Learners 🤯! Surprisingly, we show that LLMs can solve RL tasks without any external component! We introduce Prompted Policy Search (ProPS), an RL method based only LLMs and in-context learning. [Paper]

Heni Ben Amor

51,172 次观看 • 6 个月前

Introducing Scout Fast AF: the fastest and most powerful general AI agent with its own computer Watch Scout install, code, and benchmark Zig + Rust on a N-body simulation in 5 minutes

Introducing Scout Fast AF: the fastest and most powerful general AI agent with its own computer Watch Scout install, code, and benchmark Zig + Rust on a N-body simulation in 5 minutes

Scout

79,377 次观看 • 1 年前

Introducing Reinforcement-Learned Teachers (RLTs): Transforming how we teach LLMs to reason with reinforcement learning (RL). Blog: Paper: Traditional RL focuses on “learning to solve” challenging problems with expensive LLMs and constitutes a key step in making student AI systems ultimately acquire reasoning capabilities via distillation and cold-starting. Enter our RLTs—a new class of models prompted with not only a problem’s question but also its solution, and directly trained to generate clear, step-by-step “explanations” to teach their students. Remarkably, an RLT with only 7B parameters produces superior results when distilling and cold-starting students in competitive and graduate-level reasoning tasks than orders-of-magnitude larger LLMs. RLTs are as effective even when distilling 32B students, much larger than the teacher itself—unlocking a new standard for efficiency in developing reasoning language models with RL. Code:

Introducing Reinforcement-Learned Teachers (RLTs): Transforming how we teach LLMs to reason with reinforcement learning (RL). Blog: Paper: Traditional RL focuses on “learning to solve” challenging problems with expensive LLMs and constitutes a key step in making student AI systems ultimately acquire reasoning capabilities via distillation and cold-starting. Enter our RLTs—a new class of models prompted with not only a problem’s question but also its solution, and directly trained to generate clear, step-by-step “explanations” to teach their students. Remarkably, an RLT with only 7B parameters produces superior results when distilling and cold-starting students in competitive and graduate-level reasoning tasks than orders-of-magnitude larger LLMs. RLTs are as effective even when distilling 32B students, much larger than the teacher itself—unlocking a new standard for efficiency in developing reasoning language models with RL. Code:

Sakana AI

179,210 次观看 • 11 个月前

Excited to present FastTD3: a simple, fast, and capable off-policy RL algorithm for humanoid control -- with an open-source code to run your own humanoid RL experiments in no time! Thread below 🧵

Excited to present FastTD3: a simple, fast, and capable off-policy RL algorithm for humanoid control -- with an open-source code to run your own humanoid RL experiments in no time! Thread below 🧵

Younggyo Seo

130,900 次观看 • 1 年前

Attune (Attune) makes blazingly fast build tools for developers that require zero migration effort. Built by developers with a decade of experience in build systems and developer tools. Congrats on the launch, xin and Eliza Zhang!

Attune (Attune) makes blazingly fast build tools for developers that require zero migration effort. Built by developers with a decade of experience in build systems and developer tools. Congrats on the launch, xin and Eliza Zhang!

Y Combinator

16,783 次观看 • 1 年前

Introducing INTELLECT-3: Scaling RL to a 100B+ MoE model on our end-to-end stack Achieving state-of-the-art performance for its size across math, code and reasoning Built using the same tools we put in your hands, from environments & evals, RL frameworks, sandboxes & more

Introducing INTELLECT-3: Scaling RL to a 100B+ MoE model on our end-to-end stack Achieving state-of-the-art performance for its size across math, code and reasoning Built using the same tools we put in your hands, from environments & evals, RL frameworks, sandboxes & more

Prime Intellect

1,133,644 次观看 • 6 个月前

🚀 Introducing #DreamCraft3D, our breakthrough hierarchical technique for 3D content generation. Leading the way in AIGC, we're redefining 3D generation: Project page: Paper: Code:

🚀 Introducing #DreamCraft3D, our breakthrough hierarchical technique for 3D content generation. Leading the way in AIGC, we're redefining 3D generation: Project page: Paper: Code:

DeepSeek

16,400 次观看 • 2 年前