Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Our model can now learn from its own experience with RL! Our new π*0.6 model can more than double throughput over a base model trained without RL, and can perform real-world tasks: making espresso drinks, folding diverse laundry, and assembling boxes. More in the thread below.

Physical Intelligence

45,516 subscribers

704,626 views • 7 months ago •via X (Twitter)

Science & Technology

Anya Rossi• Live Now

Private livecam show

0 Comments

No comments available

Comments from the original post will appear here

Related Videos

We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.

We developed an RL method for fine-tuning our models for precise tasks in just a few hours or even minutes. Instead of training the whole model, we add an “RL token” output to π-0.6, our latest model, which is used by a tiny actor and critic to learn quickly with RL.

Physical Intelligence

431,287 views • 3 months ago

RL in the real world presents some big challenges, but also some really big opportunities. In our new work, HIL-SERL, Charles Xu, Jeffrey Wu, Jianlan Luo show that real-world RL can learn a huge range of precise and robust tasks, and perform them much faster than imitation.

RL in the real world presents some big challenges, but also some really big opportunities. In our new work, HIL-SERL, Charles Xu, Jeffrey Wu, Jianlan Luo show that real-world RL can learn a huge range of precise and robust tasks, and perform them much faster than imitation.

Sergey Levine

36,489 views • 1 year ago

With RL, the robot can learn very precise tasks, like fastening a zip tie, and can actually do it more consistently and more quickly than even human teleoperation.

With RL, the robot can learn very precise tasks, like fastening a zip tie, and can actually do it more consistently and more quickly than even human teleoperation.

Physical Intelligence

16,287 views • 3 months ago

Excited to share our latest progress on DYNA-1 pre-training! 🤖 The base model now can perform diverse, dexterous tasks (laundry folding, package sorting, …) without any post-training, even in unseen environments. This powerful base also allows extremely efficient fine-tuning to ~100% success on challenging new tasks with as little as 1 hour of data! 🤯 Watch it master two of them: cup stacking & celery chopping on repeat, no failures. 👇

Excited to share our latest progress on DYNA-1 pre-training! 🤖 The base model now can perform diverse, dexterous tasks (laundry folding, package sorting, …) without any post-training, even in unseen environments. This powerful base also allows extremely efficient fine-tuning to ~100% success on challenging new tasks with as little as 1 hour of data! 🤯 Watch it master two of them: cup stacking & celery chopping on repeat, no failures. 👇

Dyna Robotics

66,619 views • 7 months ago

We have news! We created a new robotics model called Loop Model 1. On the zip-tie insertion task, it achieves 20x more throughput per unit of data than "Pi06 + RLT" from Physical Intelligence, a top model for such tasks. It’s the missing piece that makes MicroFactory work, because now deployment becomes so simple and fast that our users can do it themselves.

We have news! We created a new robotics model called Loop Model 1. On the zip-tie insertion task, it achieves 20x more throughput per unit of data than "Pi06 + RLT" from Physical Intelligence, a top model for such tasks. It’s the missing piece that makes MicroFactory work, because now deployment becomes so simple and fast that our users can do it themselves.

Igor Kulakov (MicroFactory)

75,879 views • 1 month ago

Exciting progress on Vision-Language-Action models from a collaboration between San Francisco-based Physical Intelligence (π) and China’s AGIBOT: (π)’s single model can autonomously perform diverse tasks on the AGIBOT G1 robot, using both humanoid hands and two-finger grippers.

Exciting progress on Vision-Language-Action models from a collaboration between San Francisco-based Physical Intelligence (π) and China’s AGIBOT: (π)’s single model can autonomously perform diverse tasks on the AGIBOT G1 robot, using both humanoid hands and two-finger grippers.

The Humanoid Hub

32,575 views • 1 year ago

Orange you glad our robot can stack? Other tasks GEN-1 can do: Read more about GEN-1, our latest foundation model for the physical world:

Orange you glad our robot can stack? Other tasks GEN-1 can do: Read more about GEN-1, our latest foundation model for the physical world:

Generalist

22,849 views • 2 months ago

GEN-1 removes thumbtacks and papers from corkboard. Other tasks GEN-1 can do: Read more about GEN-1, our latest foundation model for the physical world:

GEN-1 removes thumbtacks and papers from corkboard. Other tasks GEN-1 can do: Read more about GEN-1, our latest foundation model for the physical world:

Generalist

24,247 views • 2 months ago

Our vision is for AI that uses world models to adapt in new and dynamic environments and efficiently learn new skills. We’re sharing V-JEPA 2, a new world model with state-of-the-art performance in visual understanding and prediction. V-JEPA 2 is a 1.2 billion-parameter model, trained on video, that can enable zero-shot planning in robots—allowing them to plan and execute tasks in unfamiliar environments. Learn more about V-JEPA 2 ➡️ As we continue working toward our goal of achieving advanced machine intelligence (AMI), we’re also releasing three new benchmarks for evaluating how well existing models can reason about the physical world from video. Learn more and download the new benchmarks ➡️

Our vision is for AI that uses world models to adapt in new and dynamic environments and efficiently learn new skills. We’re sharing V-JEPA 2, a new world model with state-of-the-art performance in visual understanding and prediction. V-JEPA 2 is a 1.2 billion-parameter model, trained on video, that can enable zero-shot planning in robots—allowing them to plan and execute tasks in unfamiliar environments. Learn more about V-JEPA 2 ➡️ As we continue working toward our goal of achieving advanced machine intelligence (AMI), we’re also releasing three new benchmarks for evaluating how well existing models can reason about the physical world from video. Learn more and download the new benchmarks ➡️

AI at Meta

309,942 views • 1 year ago

Our newest model, π0.7, has some interesting emergent capabilities: it can control a new robot to fold shirts for which we had no shirt folding data, figure out how to use an appliance with language-based coaching, and perform a wide range of dexterous tasks all in one model!

Our newest model, π0.7, has some interesting emergent capabilities: it can control a new robot to fold shirts for which we had no shirt folding data, figure out how to use an appliance with language-based coaching, and perform a wide range of dexterous tasks all in one model!

Physical Intelligence

456,897 views • 2 months ago

One more thing 🚀 Qwen’s agentic capability is no longer limited to the digital world — we’re bringing it into physical world. With our in-house robotic agentic system and navigation model, Qwen can now control a robot to execute tasks in real-world. #qwen #embodied #robotics

One more thing 🚀 Qwen’s agentic capability is no longer limited to the digital world — we’re bringing it into physical world. With our in-house robotic agentic system and navigation model, Qwen can now control a robot to execute tasks in real-world. #qwen #embodied #robotics

xiong-hui (barry) chen

24,797 views • 1 month ago

The best way to get robust, high-quality robot performance is through reinforcement learning; but RL in either the real world or a traditional simulation has lots of limitations. Instead, Jiazhi Yang in RISE does RL in a compositional world model. Learn more ->

The best way to get robust, high-quality robot performance is through reinforcement learning; but RL in either the real world or a traditional simulation has lots of limitations. Instead, Jiazhi Yang in RISE does RL in a compositional world model. Learn more ->

Chris Paxton

33,766 views • 17 days ago

Thanks AK! Finally, robot can do continuous, agile, autonomous, adaptive jumping over stair and stepping stone Key idea: combine the pros of model-free RL and model-based control. RL (for CoM refs) + QP (for GRF) + WBC (for torque) Open-sourced:

Thanks AK! Finally, robot can do continuous, agile, autonomous, adaptive jumping over stair and stepping stone Key idea: combine the pros of model-free RL and model-based control. RL (for CoM refs) + QP (for GRF) + WBC (for torque) Open-sourced:

Guanya Shi

32,155 views • 1 year ago

Watch this robot dog learn to walk from scratch in real time! Our new method, APRL, dynamically adjusts exploration constraints to enable fast and performant RL directly in the real world. APRL can also adapt to changes in the terrain. No simulation, no demos. A thread 👇

Watch this robot dog learn to walk from scratch in real time! Our new method, APRL, dynamically adjusts exploration constraints to enable fast and performant RL directly in the real world. APRL can also adapt to changes in the terrain. No simulation, no demos. A thread 👇

Sergey Levine

105,568 views • 2 years ago

GEN-1 plays with fidget toy. Other tasks GEN-1 can do: Read more about GEN-1, our latest foundation model for the physical world:

GEN-1 plays with fidget toy. Other tasks GEN-1 can do: Read more about GEN-1, our latest foundation model for the physical world:

Generalist

17,642 views • 2 months ago

Building upon SimpleVLA-RL, we have implemented real-world RL on long-horizon dexterous tasks and witnessed a non-trivial (~relatively 300%) performance improvement over the SFT model, along with surprising capabilities on auto-recovery. Blog coming soon. The entire process uses very little data and training compute—basically costing no more than a single robotic arm—hinting that real-world generality for machines is actually within sight.

Building upon SimpleVLA-RL, we have implemented real-world RL on long-horizon dexterous tasks and witnessed a non-trivial (~relatively 300%) performance improvement over the SFT model, along with surprising capabilities on auto-recovery. Blog coming soon. The entire process uses very little data and training compute—basically costing no more than a single robotic arm—hinting that real-world generality for machines is actually within sight.

Ning Ding

92,753 views • 5 months ago

Big News! Meta just released Segment Anything, a new AI model that can "cut out" any object, in any image/video, with a single click. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks.

Big News! Meta just released Segment Anything, a new AI model that can "cut out" any object, in any image/video, with a single click. The model is designed and trained to be promptable, so it can transfer zero-shot to new image distributions and tasks.

Lior Alexander

290,190 views • 3 years ago

Introducing Meta Locate 3D: a model for accurate object localization in 3D environments. Learn how Meta Locate 3D can help robots accurately understand their surroundings and interact more naturally with humans. You can download the model and dataset, read our research paper, and even try a demo!

Introducing Meta Locate 3D: a model for accurate object localization in 3D environments. Learn how Meta Locate 3D can help robots accurately understand their surroundings and interact more naturally with humans. You can download the model and dataset, read our research paper, and even try a demo!

AI at Meta

81,287 views • 1 year ago

Introducing Moonlake's 3D Agent. Our agent acts like a technical artist that can build and reconstruct articulated assets and large-scale editable scenes with hundreds of objects from a single image and can improve its generations continuously. Learn more in the thread below.

Introducing Moonlake's 3D Agent. Our agent acts like a technical artist that can build and reconstruct articulated assets and large-scale editable scenes with hundreds of objects from a single image and can improve its generations continuously. Learn more in the thread below.

Moonlake

1,161,139 views • 2 months ago

We’re redefining what’s possible with AI. With the release of our latest model, Command A, optimized for real-world agentic and multilingual tasks, we’re demonstrating our commitment to bringing enterprises AI that goes beyond the ordinary, and offers security & efficiency. Our team has developed highly capable and efficient models that can be run on just 2 GPUs. Check out our tech report to learn more:

We’re redefining what’s possible with AI. With the release of our latest model, Command A, optimized for real-world agentic and multilingual tasks, we’re demonstrating our commitment to bringing enterprises AI that goes beyond the ordinary, and offers security & efficiency. Our team has developed highly capable and efficient models that can be run on just 2 GPUs. Check out our tech report to learn more:

Cohere

16,413 views • 1 year ago