Loading video...
Video Failed to Load
How to use simulation data for real-world robot manipulation? We present sim-and-real co-training, a simple recipe for manipulation. We demonstrate that sim data can significantly enhance real-world performance, even with notable differences between the sim and the real. (1/n)
44,173 views • 1 year ago •via X (Twitter)
11 Comments

Paper: Website: We consider two types of simulation datasets: Task-Aware Digital Cousins and Task-Agnostic Prior Simulation Data. Task-Aware Digital Cousins: First introduced by Dai et al., digital cousins are virtual assets that, unlike a digital twin, do not explicitly model a real-world counterpart but still exhibit similar geometric and semantic affordances. In this work, we use "task-aware digital cousins" to refer to simulation tasks that share the same task semantics, namely the object categories in the environment and the same behaviors. (2/n)

Task-Agnostic Prior Simulation Data: We also consider existing large-scale simulation datasets, which require no additional efforts on designing new tasks or collecting new data and have significantly more diversity but less alignment. (3/n)

Through comprehensive experiments, we present a simple recipe for effectively utilizing simulation data in real-world manipulation tasks: 1. Task and scene composition. Use task-aware digital cousins with similar task and scene compositions to real-world tasks. Multi-task prior simulation data can still help even with different compositions. 2. Object composition and initialization. Incorporate diverse objects and varying placements in simulation to improve generalization. 3. Task-aware digital cousin alignment. Ensure simulation tasks share the same definition and success criteria as real-world tasks. Similar camera viewpoints help, but perfect alignment isn't necessary. 4. Co-training hyperparameters. Use significantly more simulation data than real-world data and carefully tune the co-training ratio. (4/n)

Our sim-and-real co-training pipeline is as follows. (5/n)

We verified that sim-and-real co-training is compatible with large-scale imitation learning. Co-training with simulation data boosts the real-world performance in data-rich settings. (6/n)

Our recipe presented above comes from a comprehensive study across 11 different tasks and 2 embodiments to understand which dataset composition factors in simulation and real-world datasets matter the most. (7/n)

Our strategy allows agents to generalize to novel object entities and poses unseen in the real-world dataset. (8/n)

We find that one of the most important hyperparameters for effective co-training is the co-training ratio between sim and real data. In our experiments, a co-training ratio of 99% yielded the best performance. (9/n)

We also find camera alignment to be critical for successful co-training with task-aware digital cousin data. Training policies on severely misaligned simulation data results in a significant drop in performance compared to policies co-trained with properly aligned digital cousin data. On the Panda arm CounterToSinkPnP task, the cotraining success rate dropped from 67% to 56%, while in the GR-1 humanoid CupPnP task, it declined from 95% to 70%. However, the aligned camera does not need to be strictly identical to the real-world camera. (10/n)

This work is done at NVIDIA’s GEAR lab and UT Austin with amazing collaborators @abhirammaddukur, @Lawrence_Y_Chen, @snasiriany, @yuqi_xie5, Yu Fang, Wenqi Huang, @zuwang95, @Zhenjia_Xu, @nc__dev, @scott_e_reed, @Ken_Goldberg, @AjayMandlekar, @DrJimFan, and @yukez.

Expand the possibilities of your metabolic research. Resipher tracks real-time cellular oxygen consumption in standard 96-well plates, delivering continuous real-time data directly from your incubator. Request a free virtual demo or quote today >>
