Video wird geladen...

Video konnte nicht geladen werden

Beim Laden dieses Videos ist ein Problem aufgetreten. Dies könnte an einem vorübergehenden Netzwerkproblem liegen oder das Video ist möglicherweise nicht verfügbar.

Introducing Scalable Option Learning (SOL☀️), a blazingly fast hierarchical RL algorithm that makes progress on long-horizon tasks and demonstrates positive scaling trends on the largely unsolved NetHack benchmark, when trained for 30 billion samples. Details, paper and code in >

Mikael Henaff

1,902 subscribers

21,043 Aufrufe • vor 9 Monaten •via X (Twitter)

Anya Rossi• Live Now

Private livecam show

0 Kommentare

Keine Kommentare verfügbar

Kommentare vom Original-Post werden hier angezeigt

Ähnliche Videos

Introducing Adjoint Sampling, a new learning algorithm that trains generative models based on scalar rewards. Based on theoretical foundations developed by FAIR, Adjoint Sampling leads to a highly scalable practical algorithm, and can become the foundation for further research into highly scalable sampling methods. Read our research paper on Adjoint Sampling and download the model, code, and benchmark ➡️

Introducing Adjoint Sampling, a new learning algorithm that trains generative models based on scalar rewards. Based on theoretical foundations developed by FAIR, Adjoint Sampling leads to a highly scalable practical algorithm, and can become the foundation for further research into highly scalable sampling methods. Read our research paper on Adjoint Sampling and download the model, code, and benchmark ➡️

AI at Meta

36,987 Aufrufe • vor 1 Jahr

LLMs are capable of high-level planning, but they require pre-trained skills! Our #ICLR2024 paper instead uses LLM guidance to train RL agents from scratch to solve 25+ long-horizon robotics tasks across four benchmarks w/ >85% success rates Paper & code:

LLMs are capable of high-level planning, but they require pre-trained skills! Our #ICLR2024 paper instead uses LLM guidance to train RL agents from scratch to solve 25+ long-horizon robotics tasks across four benchmarks w/ >85% success rates Paper & code:

Murtaza Dalal

36,264 Aufrufe • vor 2 Jahren

Scaling laws in deep RL? Turns out that batch size, learning rate, and UTD (update-to-data) for getting the most efficient and scalable deep RL has predictable relationships. Checkout the analysis in new work by Oleg Rybkin & collaborators:

Scaling laws in deep RL? Turns out that batch size, learning rate, and UTD (update-to-data) for getting the most efficient and scalable deep RL has predictable relationships. Checkout the analysis in new work by Oleg Rybkin & collaborators:

Sergey Levine

43,464 Aufrufe • vor 1 Jahr

Announcing ManiSkill2: a unified, fast, and accessible benchmark - robot learning made simple! - ✅Pip installable & easily deployable - ⚡Blazingly fast visual RL support - ✨Diverse task families, objects, & 4M demonstration frames - 🖥️Interactive GUI

Announcing ManiSkill2: a unified, fast, and accessible benchmark - robot learning made simple! - ✅Pip installable & easily deployable - ⚡Blazingly fast visual RL support - ✨Diverse task families, objects, & 4M demonstration frames - 🖥️Interactive GUI

Hao Su Lab

42,096 Aufrufe • vor 3 Jahren

Today, we present a step-change in robotic AI Sunday. Introducing ACT-1: A frontier robot foundation model trained on zero robot data. - Ultra long-horizon tasks - Zero-shot generalization - Advanced dexterity 🧵->

Today, we present a step-change in robotic AI Sunday. Introducing ACT-1: A frontier robot foundation model trained on zero robot data. - Ultra long-horizon tasks - Zero-shot generalization - Advanced dexterity 🧵->

Tony Zhao

2,045,426 Aufrufe • vor 7 Monaten

How to teach robots to perform long-horizon tasks efficiently and robustly🦾? Introducing MimicPlay - an imitation learning algorithm that uses "cheap human play data". Our approach unlocks both real-time planning through raw perception and strong robustness to disturbances!🧵👇

How to teach robots to perform long-horizon tasks efficiently and robustly🦾? Introducing MimicPlay - an imitation learning algorithm that uses "cheap human play data". Our approach unlocks both real-time planning through raw perception and strong robustness to disturbances!🧵👇

Chen Wang

288,500 Aufrufe • vor 3 Jahren

Introducing HiLMa-Res: a hierarchical RL framework for quadrupeds to tackle loco-manipulation tasks with sustained mobility! Designed for general learning tasks (vision-based, state-based, real-world data, etc), the robot now can step over stones🐾/navigate boxes📦/dribble⚽.

Introducing HiLMa-Res: a hierarchical RL framework for quadrupeds to tackle loco-manipulation tasks with sustained mobility! Designed for general learning tasks (vision-based, state-based, real-world data, etc), the robot now can step over stones🐾/navigate boxes📦/dribble⚽.

Zhongyu Li

11,720 Aufrufe • vor 1 Jahr

Want to scale RL with your shiny new GPU? 🚀 In our ICML24 Oral we find that RL algorithms hit a barrier when data is scaled up. Our new algorithm, SAPG, proposes a simple fix. It scales to 25k envs and solves hard tasks where PPO makes no progress. 1/n

Want to scale RL with your shiny new GPU? 🚀 In our ICML24 Oral we find that RL algorithms hit a barrier when data is scaled up. Our new algorithm, SAPG, proposes a simple fix. It scales to 25k envs and solves hard tasks where PPO makes no progress. 1/n

Ananye Agarwal

64,403 Aufrufe • vor 1 Jahr

Luminal ( is creating PyTorch for Production – an ML compiler that generates blazingly fast CUDA kernels and makes deploying to production one line of code. Congrats on the launch, Jake Stevens, Joe Fioti, and Matthew Gunton!

Luminal ( is creating PyTorch for Production – an ML compiler that generates blazingly fast CUDA kernels and makes deploying to production one line of code. Congrats on the launch, Jake Stevens, Joe Fioti, and Matthew Gunton!

Y Combinator

98,496 Aufrufe • vor 11 Monaten

Additionally, looking towards the future, we’re releasing PARTNR: a benchmark for Planning And Reasoning Tasks in humaN-Robot collaboration. Built on Habitat 3.0, it’s the largest benchmark of its kind to study and evaluate human-robot collaboration in household activities By providing a standardized benchmark and dataset we hope to enable new research on robots that can not only operate in isolation, but in collaboration with people. Details and code ➡️

Additionally, looking towards the future, we’re releasing PARTNR: a benchmark for Planning And Reasoning Tasks in humaN-Robot collaboration. Built on Habitat 3.0, it’s the largest benchmark of its kind to study and evaluate human-robot collaboration in household activities By providing a standardized benchmark and dataset we hope to enable new research on robots that can not only operate in isolation, but in collaboration with people. Details and code ➡️

AI at Meta

20,244 Aufrufe • vor 1 Jahr

How well do today’s frontier models handle long-horizon, multi-step web agent tasks, such as identifying the top 25 U.S. CS PhD programs with ML/AI faculty likely accepting students and compiling the results into a structured sheet? Check out our new work on Odysseys: Benchmarking Web Agents on Realistic Long Horizon Tasks Paper: Leaderboard: We introduce Odysseys, a benchmark of 200 long-horizon tasks derived from real browsing sessions and evaluated on the live Internet. We show that binary pass/fail is inadequate in this setting and propose rubric-based evaluation, which better aligns with human judgment and provides more informative signals. Across leading models, the best achieves only 44.5% success, leaving substantial headroom. We further introduce a Trajectory Efficiency metric (rubric score per step) and find efficiency remains extremely low (1.15%), highlighting a key bottleneck. Odysseys provides a realistic benchmark for measuring progress toward web agents capable of sustained, efficient, real-world operation. See a more detailed thread by Jing Yu Koh.

How well do today’s frontier models handle long-horizon, multi-step web agent tasks, such as identifying the top 25 U.S. CS PhD programs with ML/AI faculty likely accepting students and compiling the results into a structured sheet? Check out our new work on Odysseys: Benchmarking Web Agents on Realistic Long Horizon Tasks Paper: Leaderboard: We introduce Odysseys, a benchmark of 200 long-horizon tasks derived from real browsing sessions and evaluated on the live Internet. We show that binary pass/fail is inadequate in this setting and propose rubric-based evaluation, which better aligns with human judgment and provides more informative signals. Across leading models, the best achieves only 44.5% success, leaving substantial headroom. We further introduce a Trajectory Efficiency metric (rubric score per step) and find efficiency remains extremely low (1.15%), highlighting a key bottleneck. Odysseys provides a realistic benchmark for measuring progress toward web agents capable of sustained, efficient, real-world operation. See a more detailed thread by Jing Yu Koh.

Russ Salakhutdinov

22,518 Aufrufe • vor 2 Monaten

Eric Jang weighs in on RL vs. supervised learning for humanoid robot manipulation tasks.

Eric Jang weighs in on RL vs. supervised learning for humanoid robot manipulation tasks.

The Humanoid Hub

10,656 Aufrufe • vor 11 Monaten

NeurIPS 2025 Paper: LLMs are Reinforcement Learners 🤯! Surprisingly, we show that LLMs can solve RL tasks without any external component! We introduce Prompted Policy Search (ProPS), an RL method based only LLMs and in-context learning. [Paper]

NeurIPS 2025 Paper: LLMs are Reinforcement Learners 🤯! Surprisingly, we show that LLMs can solve RL tasks without any external component! We introduce Prompted Policy Search (ProPS), an RL method based only LLMs and in-context learning. [Paper]

Heni Ben Amor

51,248 Aufrufe • vor 7 Monaten

Introducing Scout Fast AF: the fastest and most powerful general AI agent with its own computer Watch Scout install, code, and benchmark Zig + Rust on a N-body simulation in 5 minutes

Introducing Scout Fast AF: the fastest and most powerful general AI agent with its own computer Watch Scout install, code, and benchmark Zig + Rust on a N-body simulation in 5 minutes

Scout

79,377 Aufrufe • vor 1 Jahr

Introducing Reinforcement-Learned Teachers (RLTs): Transforming how we teach LLMs to reason with reinforcement learning (RL). Blog: Paper: Traditional RL focuses on “learning to solve” challenging problems with expensive LLMs and constitutes a key step in making student AI systems ultimately acquire reasoning capabilities via distillation and cold-starting. Enter our RLTs—a new class of models prompted with not only a problem’s question but also its solution, and directly trained to generate clear, step-by-step “explanations” to teach their students. Remarkably, an RLT with only 7B parameters produces superior results when distilling and cold-starting students in competitive and graduate-level reasoning tasks than orders-of-magnitude larger LLMs. RLTs are as effective even when distilling 32B students, much larger than the teacher itself—unlocking a new standard for efficiency in developing reasoning language models with RL. Code:

Introducing Reinforcement-Learned Teachers (RLTs): Transforming how we teach LLMs to reason with reinforcement learning (RL). Blog: Paper: Traditional RL focuses on “learning to solve” challenging problems with expensive LLMs and constitutes a key step in making student AI systems ultimately acquire reasoning capabilities via distillation and cold-starting. Enter our RLTs—a new class of models prompted with not only a problem’s question but also its solution, and directly trained to generate clear, step-by-step “explanations” to teach their students. Remarkably, an RLT with only 7B parameters produces superior results when distilling and cold-starting students in competitive and graduate-level reasoning tasks than orders-of-magnitude larger LLMs. RLTs are as effective even when distilling 32B students, much larger than the teacher itself—unlocking a new standard for efficiency in developing reasoning language models with RL. Code:

Sakana AI

179,276 Aufrufe • vor 1 Jahr

Introducing INTELLECT-3: Scaling RL to a 100B+ MoE model on our end-to-end stack Achieving state-of-the-art performance for its size across math, code and reasoning Built using the same tools we put in your hands, from environments & evals, RL frameworks, sandboxes & more

Introducing INTELLECT-3: Scaling RL to a 100B+ MoE model on our end-to-end stack Achieving state-of-the-art performance for its size across math, code and reasoning Built using the same tools we put in your hands, from environments & evals, RL frameworks, sandboxes & more

Prime Intellect

1,137,660 Aufrufe • vor 7 Monaten

Excited to present FastTD3: a simple, fast, and capable off-policy RL algorithm for humanoid control -- with an open-source code to run your own humanoid RL experiments in no time! Thread below 🧵

Excited to present FastTD3: a simple, fast, and capable off-policy RL algorithm for humanoid control -- with an open-source code to run your own humanoid RL experiments in no time! Thread below 🧵

Younggyo Seo

130,994 Aufrufe • vor 1 Jahr

Attune (Attune) makes blazingly fast build tools for developers that require zero migration effort. Built by developers with a decade of experience in build systems and developer tools. Congrats on the launch, xin and Eliza Zhang!

Attune (Attune) makes blazingly fast build tools for developers that require zero migration effort. Built by developers with a decade of experience in build systems and developer tools. Congrats on the launch, xin and Eliza Zhang!

Y Combinator

16,787 Aufrufe • vor 1 Jahr

🚀 Introducing #DreamCraft3D, our breakthrough hierarchical technique for 3D content generation. Leading the way in AIGC, we're redefining 3D generation: Project page: Paper: Code:

🚀 Introducing #DreamCraft3D, our breakthrough hierarchical technique for 3D content generation. Leading the way in AIGC, we're redefining 3D generation: Project page: Paper: Code:

DeepSeek

16,400 Aufrufe • vor 2 Jahren