Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

Next step in dynamic dexterous grasping from NVIDIA: DextrAH-RGB! No more depth. We’re now consuming RGB stereo pairs, and the resulting perceptual system is much more robust. Trained entirely in sim (IsaacLab), leveraging fast tiled rendering, and deployed zero-shot to real.

Nathan Ratliff

1,574 subscribers

47,247 views • 1 year ago •via X (Twitter)

Science & Technology News & Politics Education

Anya Rossi• Live Now

Private livecam show

13 Comments

Nathan Ratliff1 year ago

Depth causes problems. In our earlier work (DextrAH-G, which consumed only depth w/o RGB), we had to cut out the background, block windows, and deal with eroded depth readings from object surface properties. This time around, none of that was a problem. Sunshine? No problem. Weird background? No problem. It all just worked.

Nathan Ratliff1 year ago

We use a SOTA transformer architecture with resnet-18 encoders (pretrained then finetuned) for multicamera image processing. This network is substantially larger than the simple convnet architecture we used earlier in the depth only processing (DextrAH-G).

Nathan Ratliff1 year ago

Here’s what the simulated camera feeds look like during distillation (decked out with all our domain randomizations). The second video shows the stereo RGB feed from the real-world deployment (plus some cool reactivity!).

Nathan Ratliff1 year ago

But the overall training, distillation, and deployment pipeline is essentially the same, with some minor tweaks that end up speeding cycle time by 2x and improving robustness across object scales. Distillation now takes a little over 2 days on 4xL40S’s because the perceptual network is massive (relatively) and doing some significant work. We also added automatic domain randomization (ADR) to the teacher training, which starts domain randomization ranges small and incrementally grows them over time as performance metrics improve. The teacher training takes just over 2.5 days on 8xH100s.

Nathan Ratliff1 year ago

All of the reactive regrasping you see is baked into the trained policy. It knows when it has a good grasp based on “feel” (proprioceptive difference between desired and measured joint angles).

Nathan Ratliff1 year ago

We ran a series of ablations on the perceptual architecture looking at variations of the image encoder and stereo vs monocular inputs. The main takeaway is, it’s important to start the distillation process with a pretrained resnet-18 encoder and finetune it. We tried all combinations of starting from pretrained weights vs random weights and fine tuning it vs just freezing, and it’s important to do both. Surprisingly, monocular inputs worked much better than we expected. But it makes sense in retrospect. Close one eye and try picking things up from the table. It’s actually easier than it seems.

Nathan Ratliff1 year ago

Links: Project website: DextrAH-RGB: Visuomotor Policies to Grasp Anything with Dexterous Hands

Nathan Ratliff1 year ago

Progression of prior work building toward this system: DeXtreme: Early sim2real work. Trained entirely in simulation and zero-shot deployed into the real world. DeXtreme FGP: Adds geometric fabric controllers for safer deployment. Don’t damage the robot! Background on geometric fabrics (RAL best paper 2022): DextrAH-G: Unlock the arm, and do full on grasping. Grasps anything placed in front of it. Geometric fabrics again allow us to be brazen with deployment on such a fragile physical system. This version operates only on depth.

Nathan Ratliff1 year ago

And finally DextrAH-RGB (this work): No more depth. Just direct stereo RGB processing with a scaled SOTA perception architecture.

Nathan Ratliff1 year ago

Many thanks to all our coauthors! @ritvik_singh9 @arthurallshire @ankurhandos and Karl Van Wyk. Amazing work! Ritvik and Karl, especially, drove the work and everyone put their heads together to figure out the ideal perceptual processing architecture.

Nathan Ratliff1 year ago

More details in @ritvik_singh9's thread (first author)!

Nathan Ratliff1 year ago

And @arthurallshire's!

RedDeer.Games4 years ago

Customise the colours yourself or simply connect the Joy-Cons ! Head to ➡️ for more details about #AAAClock ! #indiedev #indiegame #IndieGameDev #nintendo #OLED #switchOLED

Related Videos

Boston Dynamics collaborated with NVIDIA to demonstrate DextrAH-RGB, a workflow for dexterous grasping from stereo RGB input. The end-to-end policy for Atlas robot, trained entirely in NVIDIA Isaac Lab, transfers zero-shot from simulation to the real robot.

Boston Dynamics collaborated with NVIDIA to demonstrate DextrAH-RGB, a workflow for dexterous grasping from stereo RGB input. The end-to-end policy for Atlas robot, trained entirely in NVIDIA Isaac Lab, transfers zero-shot from simulation to the real robot.

The Humanoid Hub

79,696 views • 1 year ago

We talked to Ritvik Singh about how you can train sim-to-real dexterous manipulation policies using NVIDIA Isaac. This robot is grasping objects using pure RGB stereo: take in images from a camera pair and predict what to do, all without training in the real world.

We talked to Ritvik Singh about how you can train sim-to-real dexterous manipulation policies using NVIDIA Isaac. This robot is grasping objects using pure RGB stereo: take in images from a camera pair and predict what to do, all without training in the real world.

Chris Paxton

20,042 views • 10 months ago

A big part of scaling robot learning to solve real-world problems is that we somehow need to get enough diverse, high-quality data to train our robots to perform useful things. GPT and its fellow large language models were bootstrapped and proved out on a massive dataset of real-world language data. Unfortunately, despite our best efforts, similarly massive datasets don’t really exist for robotics — so, in our unending pursuit of high-quality, useful data, we turn to simulation. I compared a couple recent works on sim-to-real robot manipulation, which discuss how to train perception-driven manipulation policies in simulation, in such a way that they’re useful in the real world. - DextraH-RGB, from NVIDIA - Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation, also from NVIDIA — specifically the GEAR lab - Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids, another GEAR lab paper - Local Policies Enable Zero-shot Long-Horizon Manipulation, from CMU (video from DextrAH-RGB)

A big part of scaling robot learning to solve real-world problems is that we somehow need to get enough diverse, high-quality data to train our robots to perform useful things. GPT and its fellow large language models were bootstrapped and proved out on a massive dataset of real-world language data. Unfortunately, despite our best efforts, similarly massive datasets don’t really exist for robotics — so, in our unending pursuit of high-quality, useful data, we turn to simulation. I compared a couple recent works on sim-to-real robot manipulation, which discuss how to train perception-driven manipulation policies in simulation, in such a way that they’re useful in the real world. - DextraH-RGB, from NVIDIA - Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation, also from NVIDIA — specifically the GEAR lab - Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids, another GEAR lab paper - Local Policies Enable Zero-shot Long-Horizon Manipulation, from CMU (video from DextrAH-RGB)

Chris Paxton

20,486 views • 1 year ago

The "reality gap" is narrowing. Sanctuary AI has demonstrated a sim-to-real breakthrough: "zero-shot" transfer for complex in-hand manipulation. By leveraging high-DOF hydraulic hands, the company successfully reoriented a cube in the physical world using a policy trained entirely in simulation.

The "reality gap" is narrowing. Sanctuary AI has demonstrated a sim-to-real breakthrough: "zero-shot" transfer for complex in-hand manipulation. By leveraging high-DOF hydraulic hands, the company successfully reoriented a cube in the physical world using a policy trained entirely in simulation.

Humanoids daily

12,391 views • 2 months ago

Today, a step forward in open robotics - our results show that sim-to-real zero shot transfer for manipulation is possible. MolmoBot is our open model suite for robotics, trained entirely in simulation on MolmoSpaces.🧵

Today, a step forward in open robotics - our results show that sim-to-real zero shot transfer for manipulation is possible. MolmoBot is our open model suite for robotics, trained entirely in simulation on MolmoSpaces.🧵

Ai2

64,466 views • 3 months ago

LeVERB is a VLA framework for humanoid whole-body control, combining a vision-language model and a low-level controller via a shared latent action space, trained entirely in sim, deployed zero shot.

LeVERB is a VLA framework for humanoid whole-body control, combining a vision-language model and a low-level controller via a shared latent action space, trained entirely in sim, deployed zero shot.

The Humanoid Hub

10,233 views • 1 year ago

Real → Sim → Robot — fast. With Gaussian splats, GRID brings physical assets into simulation tools like AirGen and NVIDIA Isaac Sim, enabling AI skills to be tested in sim and deployed to real robots in minutes. #Robotics #PhysicalAI #Simulation

Real → Sim → Robot — fast. With Gaussian splats, GRID brings physical assets into simulation tools like AirGen and NVIDIA Isaac Sim, enabling AI skills to be tested in sim and deployed to real robots in minutes. #Robotics #PhysicalAI #Simulation

General Robotics

12,257 views • 10 months ago

Boston Dynamics taught Atlas a Rabona kick. Learned from human mocap, retargeted to Atlas, trained in sim via RL, deployed zero-shot to the real robot. Soccer skills demand whole-body coordination, and similar recipe transfers to warehouse work.

Boston Dynamics taught Atlas a Rabona kick. Learned from human mocap, retargeted to Atlas, trained in sim via RL, deployed zero-shot to the real robot. Soccer skills demand whole-body coordination, and similar recipe transfers to warehouse work.

The Humanoid Hub

11,109 views • 19 days ago

Reality of robotics: humanoid kung fu is solved before they can open doors with RGB. Here we are. Introducing the frontier of sim2real at NVIDIA GEAR. 100% sim data. RGB input only. Code name: 𝗗𝗼𝗼𝗿𝗠𝗮𝗻. We are opening the sim-to-real door. 🧵

Reality of robotics: humanoid kung fu is solved before they can open doors with RGB. Here we are. Introducing the frontier of sim2real at NVIDIA GEAR. 100% sim data. RGB input only. Code name: 𝗗𝗼𝗼𝗿𝗠𝗮𝗻. We are opening the sim-to-real door. 🧵

Haoru Xue

371,077 views • 6 months ago

Excited to introduce StereoPolicy, led by Evans Han. 📷📷🤖StereoPolicy is an effective way to add geometric cues to modern robot policy models while keeping the strengths of pretrained 2D encoders. ⁉️Why stereo for robot manipulation? Monocular RGB often lacks the depth cues needed for precise manipulation, while RGB-D and point clouds can be noisy or brittle, especially on reflective and transparent objects in real-world deployment. Instead of explicitly reconstructing disparity, depth, or point clouds, StereoPolicy directly fuses synchronized left/right RGB views to learn implicit stereo cues, avoiding extra reconstruction latency that can make real-time manipulation difficult. Project Page:

Excited to introduce StereoPolicy, led by Evans Han. 📷📷🤖StereoPolicy is an effective way to add geometric cues to modern robot policy models while keeping the strengths of pretrained 2D encoders. ⁉️Why stereo for robot manipulation? Monocular RGB often lacks the depth cues needed for precise manipulation, while RGB-D and point clouds can be noisy or brittle, especially on reflective and transparent objects in real-world deployment. Instead of explicitly reconstructing disparity, depth, or point clouds, StereoPolicy directly fuses synchronized left/right RGB views to learn implicit stereo cues, avoiding extra reconstruction latency that can make real-time manipulation difficult. Project Page:

Ruohan Zhang

848,366 views • 20 days ago

What if we can simulate an *interactive 3D world*, from a single image, in the wild, in real time? Introducing PointWorld-1B: a large pre-trained 3D world model that predicts env dynamics given RGB-D capture and robot actions. 🌐 from Stanford University NVIDIA

What if we can simulate an interactive 3D world, from a single image, in the wild, in real time? Introducing PointWorld-1B: a large pre-trained 3D world model that predicts env dynamics given RGB-D capture and robot actions. 🌐 from Stanford University NVIDIA

Wenlong Huang

273,170 views • 5 months ago

Advancing dexterous manipulation through scalable visual sim-to-real transfer. We are excited to share our RSS paper, “ViserDex: Visual Sim-to-Real for Robust Dexterous In-hand Reorientation.” 🌐 Project page: 1/N 🧵

Advancing dexterous manipulation through scalable visual sim-to-real transfer. We are excited to share our RSS paper, “ViserDex: Visual Sim-to-Real for Robust Dexterous In-hand Reorientation.” 🌐 Project page: 1/N 🧵

Robotic Systems Lab

39,143 views • 1 month ago

Very excited to share our first public release after I joined Robbyant! We present Lingbot-Depth 👀 — a state-of-the-art depth foundation model trained with RGB-D MAE on millions of real & simulated RGBD pairs. 🔹 Camera depths as natural masks for RGB-D MAE modeling 🔹 Large-scale real + sim RGBD data curation pipeline What surprised us most: ✨ Significant improvement on transparent, reflective surfaces, and thin structures — traditionally the hardest cases for depth models. 👇These are some of hard causal-capture images we tested on. The results are pretty good! #DepthEstimation #FoundationModels #EmbodiedAI #3DVision #Robotics #RobbyAnt

Very excited to share our first public release after I joined Robbyant! We present Lingbot-Depth 👀 — a state-of-the-art depth foundation model trained with RGB-D MAE on millions of real & simulated RGBD pairs. 🔹 Camera depths as natural masks for RGB-D MAE modeling 🔹 Large-scale real + sim RGBD data curation pipeline What surprised us most: ✨ Significant improvement on transparent, reflective surfaces, and thin structures — traditionally the hardest cases for depth models. 👇These are some of hard causal-capture images we tested on. The results are pretty good! #DepthEstimation #FoundationModels #EmbodiedAI #3DVision #Robotics #RobbyAnt

Yinghao Xu

22,574 views • 4 months ago

RGB Mainnet is now live Access the RGB Faucet to claim $RGB and explore the native smart contract capabilities on Bitcoin. Make Bitcoin Smart, together with Bitlight Labs⚡️

RGB Mainnet is now live Access the RGB Faucet to claim $RGB and explore the native smart contract capabilities on Bitcoin. Make Bitcoin Smart, together with Bitlight Labs⚡️

Bitlight Wallet⚡️

81,612 views • 10 months ago

Want to learn more about the battle systems in #TrailsintheSky1st? ⚔️ Check out this all-new Battle Trailer for an in-depth look at dynamic in-game battles, and much more! Pre-order the Standard Edition now:

Want to learn more about the battle systems in #TrailsintheSky1st? ⚔️ Check out this all-new Battle Trailer for an in-depth look at dynamic in-game battles, and much more! Pre-order the Standard Edition now:

GungHo

55,102 views • 9 months ago

A modular Male Suit ! Now Live ! ♥ Can be a shirt, a Standard suit, a Military suit, and everything in between ! Contains Textures animated Effect ! (RGB&More) Contains Hat & Gloves. And much more ...

A modular Male Suit ! Now Live ! ♥ Can be a shirt, a Standard suit, a Military suit, and everything in between ! Contains Textures animated Effect ! (RGB&More) Contains Hat & Gloves. And much more ...

Applenzo | Holo

13,631 views • 1 year ago

WBTC isn’t going to be how people use #Bitcoin in DeFi 10 years from now. We are going to need a much more robust, much more permissionless, much more decentralized, and censorship-resistant way to deploy Bitcoin into DeFi—and that's what $tBTC is.

WBTC isn’t going to be how people use #Bitcoin in DeFi 10 years from now. We are going to need a much more robust, much more permissionless, much more decentralized, and censorship-resistant way to deploy Bitcoin into DeFi—and that's what $tBTC is.

Threshold Network ✜

19,723 views • 1 year ago

copy original - paste - rgb channels - distort more in

copy original - paste - rgb channels - distort more in

Jorge

11,520 views • 5 months ago

Bitlight Wallet Upgraded to v0.0.12.1 We are pleased to inform everyone that Bitlight Wallet has reached a new milestone: Bitcoin mainnet. This progress marks Bitlight Wallet as a fully functional Bitcoin wallet and is our final step towards the RGB protocol mainnet. The RGB protocol is not a blockchain, so strictly speaking, there is no such thing as an RGB mainnet. However, we must work together with Dr. Maxim Maxim Orlovsky and our partners at the LNP/BP Association LNP/BP Labs to complete the audit of the RGB code and ensure the security of RGB assets before pushing the RGB protocol to the mainnet. This choice is a responsible approach for the Bitlight Labs Bitlight Labs⚡️ community, Bitlight Wallet users, and all participants in the RGB ecosystem. As Bitlight Wallet supports the #Bitcoin mainnet, we will launch a Bitcoin mainnet UTXO creation task. Creating UTXOs is a prerequisite for creating RGB20 assets, so we hope to guide everyone in preparing for the RGB mainnet launch through this task. The wind is approaching, and the RGB ship is about to set sail. Make Bitcoin Smart. #RGB #LightningNetwork #BitlightLabs #BitlightWallet #MakeBitcoinSmart #MBS

Bitlight Wallet Upgraded to v0.0.12.1 We are pleased to inform everyone that Bitlight Wallet has reached a new milestone: Bitcoin mainnet. This progress marks Bitlight Wallet as a fully functional Bitcoin wallet and is our final step towards the RGB protocol mainnet. The RGB protocol is not a blockchain, so strictly speaking, there is no such thing as an RGB mainnet. However, we must work together with Dr. Maxim Maxim Orlovsky and our partners at the LNP/BP Association LNP/BP Labs to complete the audit of the RGB code and ensure the security of RGB assets before pushing the RGB protocol to the mainnet. This choice is a responsible approach for the Bitlight Labs Bitlight Labs⚡️ community, Bitlight Wallet users, and all participants in the RGB ecosystem. As Bitlight Wallet supports the #Bitcoin mainnet, we will launch a Bitcoin mainnet UTXO creation task. Creating UTXOs is a prerequisite for creating RGB20 assets, so we hope to guide everyone in preparing for the RGB mainnet launch through this task. The wind is approaching, and the RGB ship is about to set sail. Make Bitcoin Smart. #RGB #LightningNetwork #BitlightLabs #BitlightWallet #MakeBitcoinSmart #MBS

Bitlight Wallet⚡️

88,858 views • 1 year ago

A robot hand grasp over 500 totally new objects without fail? Zero-shot, single-view & super reliable ⬇️ + Paper Grasping random objects is hard for robots, especially when shapes, weights, and materials vary. RobustDexGrasp solves this with a smart new way of seeing and controlling the hand, leading to near-perfect grasping, even in noisy or cluttered scenes. Thank you for sharing, Hui Zhang 🙏 Follow him!! What makes it special ✅ Grabs 500+ unseen objects with 94.6% success using only single-view input ✅ Learns local shapes, not full geometry, for better generalization ✅ Trained with just 35 objects in sim but works in the real world with hundreds more ✅ Adapts to noise, unexpected forces, and even plays chess with VLM planning It shows that smart sensing and adaptive control can take dexterous grasping to the next level. Project: Paper:

A robot hand grasp over 500 totally new objects without fail? Zero-shot, single-view & super reliable ⬇️ + Paper Grasping random objects is hard for robots, especially when shapes, weights, and materials vary. RobustDexGrasp solves this with a smart new way of seeing and controlling the hand, leading to near-perfect grasping, even in noisy or cluttered scenes. Thank you for sharing, Hui Zhang 🙏 Follow him!! What makes it special ✅ Grabs 500+ unseen objects with 94.6% success using only single-view input ✅ Learns local shapes, not full geometry, for better generalization ✅ Trained with just 35 objects in sim but works in the real world with hundreds more ✅ Adapts to noise, unexpected forces, and even plays chess with VLM planning It shows that smart sensing and adaptive control can take dexterous grasping to the next level. Project: Paper:

Ilir Aliu

37,980 views • 1 year ago