Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

In my past research experience, finding or developing an appropriate simulation environment, dataset, and benchmark has always been a challenge. Missing features, limited support, or unexpected bugs often occupied my days and nights. Moreover, current simulation platforms are relatively fragmented—making it challenging to replicate the success of the RT-X... dataset in unifying community efforts. Introducing RoboVerse, we provide a unified platform, dataset, and benchmark for scalable and generalizable robot learning. We hope to build a shared foundation to combine the community efforts. RoboVerse includes: MetaSim: We carefully designed a configuration system and a universal interface to align current robotic simulators. With MetaSim, you can use any simulator with the same code—bringing together the community’s diverse efforts under one framework! RoboVerse Dataset and Benchmark: We unify popular simulation environments and benchmarks into a single cohesive system and introduce the RoboVerse dataset—a large-scale, high-quality synthetic dataset. Additionally, we propose a standardized benchmark across both imitation learning and reinforcement learning. A cool feature enabled by our unified framework: Hybrid Simulation! You can now integrate physics engines and renderers from different simulators—e.g., using MuJoCo precise physics with Isaac photorealistic rendering. This not only elevates simulation fidelity but also significantly enhances real-world transfer performance across complex robotic applications. Hopefully, our team’s efforts could serve the robotic community to thrive vibrantly in the years to come. RoboVerse is open-sourced🥳!!! Project Page: Documentation: Github Repo: Paper:show more

Haoran Geng

2,068 subscribers

84,249 views • 1 year ago •via X (Twitter)

Health & Wellness Science & Technology Education

Anya Rossi• Live Now

Private livecam show

12 Comments

Haoran Geng1 year ago

Beyond what we have showcased in the video above, RoboVerse has even more exciting features waiting to be explored: Teleoperation: We support multiple teleoperation methods in RoboVerse. We designed a mobile app that utilizes phone sensors to enable seamless teleoperation within the RoboVerse platform. Also, we partially support more devices like Mocap, VR, Keyboard, and Joysticks.

Haoran Geng1 year ago

Real2Sim: We support Real-to-Sim toolset to reconstruct real-world assets from monocular video, utilizing 3D Gaussian Splatting techniques.

Haoran Geng1 year ago

AI-Generated Tasks: Based on the unified configuration of MetaSim for tasks, we leverage LLM to combine the assets from RoboVerse data and generate new tasks, showing the potential of LLM for creative task generation.

Haoran Geng1 year ago

Seamlessly Enable GPU Parallelism: RoboVerse makes it much easier to transfer tasks and benchmarks that previously didn’t support large-scale parallelism—enabling them to run large-scale parallel reinforcement learning efficiently on GPUs.

Haoran Geng1 year ago

Huge thanks to the entire team for the incredible efforts on this ambitious project🥺!!! We sincerely hope every contribution pays off. We also encourage the wider community to get involved — together, let's drive real progress in robotics🚀!

Lucid Scientific, Inc.1 year ago

Expand the possibilities of your metabolic research. Resipher tracks real-time cellular oxygen consumption in standard 96-well plates, delivering continuous real-time data directly from your incubator. Request a free virtual demo or quote today >>

Lwin Moe Aung1 year ago

Does it support Drake?

Haoran Geng1 year ago

Not yet. We are working hard to include more! Also, pull request is highly appreciated :)

arun kumar singh1 year ago

Does it support Mujoco or Mujoco JAX?

Haoran Geng1 year ago

Yes! We do support MuJoCo! Check this and have a try!

David Blanco-Mulero1 year ago

Really amazing work! I see the deformables sim is mainly based on garmentlab. Have you thought about integrating the different deformable sims objects e.g. Mujoco does well on volumetric objects, maniskill, etc. rather than only Isaac-based?

Haoran Geng1 year ago

Thank you for the suggestion! We are working hard to include more. Deformable engines from other simulators are indeed very useful, as they offer several advantages. Stay tuned!

Related Videos

Everything you love about generative models — now powered by real physics! Announcing the Genesis project — after a 24-month large-scale research collaboration involving over 20 research labs — a generative physics engine able to generate 4D dynamical worlds powered by a physics simulation platform designed for general-purpose robotics and physical AI applications. Genesis's physics engine is developed in pure Python, while being 10-80x faster than existing GPU-accelerated stacks like Isaac Gym and MJX. It delivers a simulation speed ~430,000 faster than in real-time, and takes only 26 seconds to train a robotic locomotion policy transferrable to the real world on a single RTX4090 (see tutorial: The Genesis physics engine and simulation platform is fully open source at We'll gradually roll out access to our generative framework in the near future. Genesis implements a unified simulation framework all from scratch, integrating a wide spectrum of state-of-the-art physics solvers, allowing simulation of the whole physical world in a virtual realm with the highest realism. We aim to build a universal data engine that leverages an upper-level generative framework to autonomously create physical worlds, together with various modes of data, including environments, camera motions, robotic task proposals, reward functions, robot policies, character motions, fully interactive 3D scenes, open-world articulated assets, and more, aiming towards fully automated data generation for robotics, physical AI and other applications. Open Source Code: Project webpage: Documentation: 1/n

Everything you love about generative models — now powered by real physics! Announcing the Genesis project — after a 24-month large-scale research collaboration involving over 20 research labs — a generative physics engine able to generate 4D dynamical worlds powered by a physics simulation platform designed for general-purpose robotics and physical AI applications. Genesis's physics engine is developed in pure Python, while being 10-80x faster than existing GPU-accelerated stacks like Isaac Gym and MJX. It delivers a simulation speed ~430,000 faster than in real-time, and takes only 26 seconds to train a robotic locomotion policy transferrable to the real world on a single RTX4090 (see tutorial: The Genesis physics engine and simulation platform is fully open source at We'll gradually roll out access to our generative framework in the near future. Genesis implements a unified simulation framework all from scratch, integrating a wide spectrum of state-of-the-art physics solvers, allowing simulation of the whole physical world in a virtual realm with the highest realism. We aim to build a universal data engine that leverages an upper-level generative framework to autonomously create physical worlds, together with various modes of data, including environments, camera motions, robotic task proposals, reward functions, robot policies, character motions, fully interactive 3D scenes, open-world articulated assets, and more, aiming towards fully automated data generation for robotics, physical AI and other applications. Open Source Code: Project webpage: Documentation: 1/n

Zhou Xian

3,817,375 views • 1 year ago

DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision paper page: We have witnessed significant progress in deep learning-based 3D vision, ranging from neural radiance field (NeRF) based 3D representation learning to applications in novel view synthesis (NVS). However, existing scene-level datasets for deep learning-based 3D vision, limited to either synthetic environments or a narrow selection of real-world scenes, are quite insufficient. This insufficiency not only hinders a comprehensive benchmark of existing methods but also caps what could be explored in deep learning-based 3D analysis. To address this critical gap, we present DL3DV-10K, a large-scale scene dataset, featuring 51.2 million frames from 10,510 videos captured from 65 types of point-of-interest (POI) locations, covering both bounded and unbounded scenes, with different levels of reflection, transparency, and lighting. We conducted a comprehensive benchmark of recent NVS methods on DL3DV-10K, which revealed valuable insights for future research in NVS. In addition, we have obtained encouraging results in a pilot study to learn generalizable NeRF from DL3DV-10K, which manifests the necessity of a large-scale scene-level dataset to forge a path toward a foundation model for learning 3D representation.

DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision paper page: We have witnessed significant progress in deep learning-based 3D vision, ranging from neural radiance field (NeRF) based 3D representation learning to applications in novel view synthesis (NVS). However, existing scene-level datasets for deep learning-based 3D vision, limited to either synthetic environments or a narrow selection of real-world scenes, are quite insufficient. This insufficiency not only hinders a comprehensive benchmark of existing methods but also caps what could be explored in deep learning-based 3D analysis. To address this critical gap, we present DL3DV-10K, a large-scale scene dataset, featuring 51.2 million frames from 10,510 videos captured from 65 types of point-of-interest (POI) locations, covering both bounded and unbounded scenes, with different levels of reflection, transparency, and lighting. We conducted a comprehensive benchmark of recent NVS methods on DL3DV-10K, which revealed valuable insights for future research in NVS. In addition, we have obtained encouraging results in a pilot study to learn generalizable NeRF from DL3DV-10K, which manifests the necessity of a large-scale scene-level dataset to forge a path toward a foundation model for learning 3D representation.

AK

49,917 views • 2 years ago

PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking paper page: introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. Toward the goal of naturalism, we animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos. We create combinatorial diversity by randomizing character appearance, motion profiles, materials, lighting, 3D assets, and atmospheric effects. Our dataset currently includes 104 videos, averaging 2,000 frames long, with orders of magnitude more correspondence annotations than prior work. We show that existing methods can be trained from scratch in our dataset and outperform the published variants. Finally, we introduce modifications to the PIPs point tracking method, greatly widening its temporal receptive field, which improves its performance on PointOdyssey as well as on two real-world benchmarks.

PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking paper page: introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. Toward the goal of naturalism, we animate deformable characters using real-world motion capture data, we build 3D scenes to match the motion capture environments, and we render camera viewpoints using trajectories mined via structure-from-motion on real videos. We create combinatorial diversity by randomizing character appearance, motion profiles, materials, lighting, 3D assets, and atmospheric effects. Our dataset currently includes 104 videos, averaging 2,000 frames long, with orders of magnitude more correspondence annotations than prior work. We show that existing methods can be trained from scratch in our dataset and outperform the published variants. Finally, we introduce modifications to the PIPs point tracking method, greatly widening its temporal receptive field, which improves its performance on PointOdyssey as well as on two real-world benchmarks.

AK

122,533 views • 3 years ago

Synthetic data will provide the next trillion tokens to fuel our hungry models. I'm excited to announce MimicGen: massively scaling up data pipeline for robot learning! We multiply high-quality human data in simulation with digital twins. Using 50,000 training episodes across 18 tasks, multiple simulators, and even in the real-world! The idea is simple: 1. Humans tele-operate the robot to complete a task. It is extremely high-quality but also very slow and expensive. 2. We create a digital twin of the robot and the scene in high-fidelity, GPU-accelerated simulation. 3. We can now move objects around, replace with new assets, and even change the robot hand - basically augment the training data with procedural generation. 4. Export the successful episodes, and feed that to a neural network! You now have an near-infinite stream of data. One of the key reasons that robotics lags far behind other AI fields is the lack of data: you cannot scrape control signals from the internet. They simply don't exist in-the-wild. MimicGen shows the power of synthetic data and simulation to keep our scaling laws alive. I believe this principle apply beyond robotics. We are quickly exhausting the high-quality, real tokens from the web. Artificial intelligence from artificial data will be the way forward. We are big fans of the OSS community. As usual, we open-source everything, including the generated dataset! - Website: - Paper: - Dataset is hosted on HuggingFace (thanks AK!!): - Code: MimicGen is led by Ajay Mandlekar, deep dive in the thread:

Synthetic data will provide the next trillion tokens to fuel our hungry models. I'm excited to announce MimicGen: massively scaling up data pipeline for robot learning! We multiply high-quality human data in simulation with digital twins. Using 50,000 training episodes across 18 tasks, multiple simulators, and even in the real-world! The idea is simple: 1. Humans tele-operate the robot to complete a task. It is extremely high-quality but also very slow and expensive. 2. We create a digital twin of the robot and the scene in high-fidelity, GPU-accelerated simulation. 3. We can now move objects around, replace with new assets, and even change the robot hand - basically augment the training data with procedural generation. 4. Export the successful episodes, and feed that to a neural network! You now have an near-infinite stream of data. One of the key reasons that robotics lags far behind other AI fields is the lack of data: you cannot scrape control signals from the internet. They simply don't exist in-the-wild. MimicGen shows the power of synthetic data and simulation to keep our scaling laws alive. I believe this principle apply beyond robotics. We are quickly exhausting the high-quality, real tokens from the web. Artificial intelligence from artificial data will be the way forward. We are big fans of the OSS community. As usual, we open-source everything, including the generated dataset! - Website: - Paper: - Dataset is hosted on HuggingFace (thanks AK!!): - Code: MimicGen is led by Ajay Mandlekar, deep dive in the thread:

Jim Fan

332,199 views • 2 years ago

I don’t know if we live in a Matrix, but I know for sure that robots will spend most of their lives in simulation. Let machines train machines. I’m excited to introduce DexMimicGen, a massive-scale synthetic data generator that enables a humanoid robot to learn complex skills from only a handful of human demonstrations. Yes, as few as 5! DexMimicGen addresses the biggest pain point in robotics: where do we get data? Unlike with LLMs, where vast amounts of texts are readily available, you cannot simply download motor control signals from the internet. So researchers teleoperate the robots to collect motion data via XR headsets. They have to repeat the same skill over and over and over again, because neural nets are data hungry. This is a very slow and uncomfortable process. At NVIDIA, we believe the majority of high-quality tokens for robot foundation models will come from simulation. What DexMimicGen does is to trade GPU compute time for human time. It takes one motion trajectory from human, and multiplies into 1000s of new trajectories. A robot brain trained on this augmented dataset will generalize far better in the real world. Think of DexMimicGen as a learning signal amplifier. It maps a small dataset to a large (de facto infinite) dataset, using physics simulation in the loop. In this way, we free humans from babysitting the bots all day. The future of robot data is generative. The future of the entire robot learning pipeline will also be generative. 🧵

I don’t know if we live in a Matrix, but I know for sure that robots will spend most of their lives in simulation. Let machines train machines. I’m excited to introduce DexMimicGen, a massive-scale synthetic data generator that enables a humanoid robot to learn complex skills from only a handful of human demonstrations. Yes, as few as 5! DexMimicGen addresses the biggest pain point in robotics: where do we get data? Unlike with LLMs, where vast amounts of texts are readily available, you cannot simply download motor control signals from the internet. So researchers teleoperate the robots to collect motion data via XR headsets. They have to repeat the same skill over and over and over again, because neural nets are data hungry. This is a very slow and uncomfortable process. At NVIDIA, we believe the majority of high-quality tokens for robot foundation models will come from simulation. What DexMimicGen does is to trade GPU compute time for human time. It takes one motion trajectory from human, and multiplies into 1000s of new trajectories. A robot brain trained on this augmented dataset will generalize far better in the real world. Think of DexMimicGen as a learning signal amplifier. It maps a small dataset to a large (de facto infinite) dataset, using physics simulation in the loop. In this way, we free humans from babysitting the bots all day. The future of robot data is generative. The future of the entire robot learning pipeline will also be generative. 🧵

Jim Fan

165,246 views • 1 year ago

Excited to share our latest work on 🎧spatial audio-driven human motion generation. We aim to tackle a largely underexplored yet important problem of enabling virtual humans to move naturally in response to spatial audio—capturing not just what is heard, but also where the sound is coming from. To this end, we introduce the Spatial Audio-Driven Human Motion (SAM) dataset—the first comprehensive dataset featuring paired high-quality human motion and spatial audio recordings. For benchmarking, we develop a generative framework for human MOtion generation driven by SPAtial audio, termed MOSPA, which learns to synthesize realistic and diverse human motions conditioned on spatial audio input. We hope this research could provide a foundation for future research in spatial perception, virtual characters, and embodied AI. The dataset and model will be open-sourced soon. A big thank you to our intern, Shuyang Xu, for the wonderful collaboration! Congratulations, Shuyang! Project page: Paper: Video: #Animation #CG #CV #AIGC #DL #Deeplearning #Motion #Graphics #AI #GenerativeAI

Excited to share our latest work on 🎧spatial audio-driven human motion generation. We aim to tackle a largely underexplored yet important problem of enabling virtual humans to move naturally in response to spatial audio—capturing not just what is heard, but also where the sound is coming from. To this end, we introduce the Spatial Audio-Driven Human Motion (SAM) dataset—the first comprehensive dataset featuring paired high-quality human motion and spatial audio recordings. For benchmarking, we develop a generative framework for human MOtion generation driven by SPAtial audio, termed MOSPA, which learns to synthesize realistic and diverse human motions conditioned on spatial audio input. We hope this research could provide a foundation for future research in spatial perception, virtual characters, and embodied AI. The dataset and model will be open-sourced soon. A big thank you to our intern, Shuyang Xu, for the wonderful collaboration! Congratulations, Shuyang! Project page: Paper: Video: #Animation #CG #CV #AIGC #DL #Deeplearning #Motion #Graphics #AI #GenerativeAI

Zhiyang (Frank) Dou

14,610 views • 1 year ago

A big part of scaling robot learning to solve real-world problems is that we somehow need to get enough diverse, high-quality data to train our robots to perform useful things. GPT and its fellow large language models were bootstrapped and proved out on a massive dataset of real-world language data. Unfortunately, despite our best efforts, similarly massive datasets don’t really exist for robotics — so, in our unending pursuit of high-quality, useful data, we turn to simulation. I compared a couple recent works on sim-to-real robot manipulation, which discuss how to train perception-driven manipulation policies in simulation, in such a way that they’re useful in the real world. - DextraH-RGB, from NVIDIA - Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation, also from NVIDIA — specifically the GEAR lab - Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids, another GEAR lab paper - Local Policies Enable Zero-shot Long-Horizon Manipulation, from CMU (video from DextrAH-RGB)

A big part of scaling robot learning to solve real-world problems is that we somehow need to get enough diverse, high-quality data to train our robots to perform useful things. GPT and its fellow large language models were bootstrapped and proved out on a massive dataset of real-world language data. Unfortunately, despite our best efforts, similarly massive datasets don’t really exist for robotics — so, in our unending pursuit of high-quality, useful data, we turn to simulation. I compared a couple recent works on sim-to-real robot manipulation, which discuss how to train perception-driven manipulation policies in simulation, in such a way that they’re useful in the real world. - DextraH-RGB, from NVIDIA - Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation, also from NVIDIA — specifically the GEAR lab - Sim-to-Real Reinforcement Learning for Vision-Based Dexterous Manipulation on Humanoids, another GEAR lab paper - Local Policies Enable Zero-shot Long-Horizon Manipulation, from CMU (video from DextrAH-RGB)

Chris Paxton

20,486 views • 1 year ago

Today, we publicly released RoboCasa365, a large-scale simulation benchmark for training and systematically evaluating generalist robot models. Built upon our original RoboCasa framework, it offers: • 2,500 realistic kitchen environments; • 365 everyday tasks (basic skills + long-horizon mobile manipulation); • Over 3,200 objects with many articulated fixtures/appliances. All are designed for fully controlled, reproducible benchmarking of robotic policies. Progress in robotic foundation models is real. But it’s still hard to answer basic questions like: How close are we to general-purpose autonomy? What factors drive generalization? What are the model/data scaling curves like? Real-world eval is slow and noisy, and existing sims (like LIBERO, which we built 3 years ago) often lack sufficient task and scene diversity. This benchmark comes with 2,200+ hours of demonstrations and 500K+ trajectories to support studies of multi-task training, pretraining, and continual learning at scale. Check it out at

Today, we publicly released RoboCasa365, a large-scale simulation benchmark for training and systematically evaluating generalist robot models. Built upon our original RoboCasa framework, it offers: • 2,500 realistic kitchen environments; • 365 everyday tasks (basic skills + long-horizon mobile manipulation); • Over 3,200 objects with many articulated fixtures/appliances. All are designed for fully controlled, reproducible benchmarking of robotic policies. Progress in robotic foundation models is real. But it’s still hard to answer basic questions like: How close are we to general-purpose autonomy? What factors drive generalization? What are the model/data scaling curves like? Real-world eval is slow and noisy, and existing sims (like LIBERO, which we built 3 years ago) often lack sufficient task and scene diversity. This benchmark comes with 2,200+ hours of demonstrations and 500K+ trajectories to support studies of multi-task training, pretraining, and continual learning at scale. Check it out at

Yuke Zhu

23,928 views • 4 months ago

Besides reading cool papers, my Twitter account is mostly used for catching up Formula 1 news. Very excited about Lando Norris 's great performance recently. Now we combine Formula Racing to AI research: The following video shows we train a Reinforcement Learning policy to drive a Dallara F317 in Monza and Barcelona Circuits autonomously in simulation. You can think of it as optimizing a Qualifying lap: You want the car to run a single lap, no opponents, with the fastest speed, and shortest lap time. We open-source the code to train RL in a racing simulator with Assetto Corsa, a widely deployed platform for esports. We also train with human demonstrations with multiple expert drivers including a professional e-sports driver. Interestingly AI is not beating the professional yet, but close. See our open-sourced code, simulator, and data here:

Besides reading cool papers, my Twitter account is mostly used for catching up Formula 1 news. Very excited about Lando Norris 's great performance recently. Now we combine Formula Racing to AI research: The following video shows we train a Reinforcement Learning policy to drive a Dallara F317 in Monza and Barcelona Circuits autonomously in simulation. You can think of it as optimizing a Qualifying lap: You want the car to run a single lap, no opponents, with the fastest speed, and shortest lap time. We open-source the code to train RL in a racing simulator with Assetto Corsa, a widely deployed platform for esports. We also train with human demonstrations with multiple expert drivers including a professional e-sports driver. Interestingly AI is not beating the professional yet, but close. See our open-sourced code, simulator, and data here:

Xiaolong Wang

32,361 views • 2 years ago

We are back again :) After three weeks of quiet building. Introducing Genesis World 1.0, our latest simulation platform, the second release in our full-stack suite. Open-sourced. Robotics is still bottlenecked by the 1× speed of the physical world. Every model, checkpoint, and data recipe eventually needs to be tested on physical hardware, slowly, expensively, and with limited coverage. One hour in reality can become 100 days in simulation. That is how robotics model iteration moves from a wall-clock bottleneck to a compute problem. To make this work, simulation has to be both fast and trustworthy. Over the past year, we rebuilt the entire stack: a GPU-accelerated cross-platform compiler, penetration-free multi-physics contact solvers, unified rigid and deformable physics, and a photo-realistic renderer purpose-built for physical AI applications. We built Nyx, a high-performance path-traced rendering engine for robotics application. Genesis World 1.0 achieves near realtime performance with our latest development for penetration-free IPC solver, supporting various types of deformables beyond rigid bodies. It supports contact-rich, dexterous manipulation simulation across different embodiments: unitree, sharpa, wuji, genesis hand and various types of grippers. Under the hood is Quadrants, our effort in pushing forward cross-platform GPU-accelerated computation. Quadrants started as a fork of Taichi, and we rebuilt most of the critical parts for optimizing simulation workloads, giving 10x faster launch time and up to 4.6x runtime performance compared to the initial Genesis release. Together, they bring us to an unprecedentedly low sim-to-real gap, enabling zero-shot real-to-sim model evaluation and much faster iteration of GENE. All available today. Genesis World 1.0: Quadrants: Nyx:

We are back again :) After three weeks of quiet building. Introducing Genesis World 1.0, our latest simulation platform, the second release in our full-stack suite. Open-sourced. Robotics is still bottlenecked by the 1× speed of the physical world. Every model, checkpoint, and data recipe eventually needs to be tested on physical hardware, slowly, expensively, and with limited coverage. One hour in reality can become 100 days in simulation. That is how robotics model iteration moves from a wall-clock bottleneck to a compute problem. To make this work, simulation has to be both fast and trustworthy. Over the past year, we rebuilt the entire stack: a GPU-accelerated cross-platform compiler, penetration-free multi-physics contact solvers, unified rigid and deformable physics, and a photo-realistic renderer purpose-built for physical AI applications. We built Nyx, a high-performance path-traced rendering engine for robotics application. Genesis World 1.0 achieves near realtime performance with our latest development for penetration-free IPC solver, supporting various types of deformables beyond rigid bodies. It supports contact-rich, dexterous manipulation simulation across different embodiments: unitree, sharpa, wuji, genesis hand and various types of grippers. Under the hood is Quadrants, our effort in pushing forward cross-platform GPU-accelerated computation. Quadrants started as a fork of Taichi, and we rebuilt most of the critical parts for optimizing simulation workloads, giving 10x faster launch time and up to 4.6x runtime performance compared to the initial Genesis release. Together, they bring us to an unprecedentedly low sim-to-real gap, enabling zero-shot real-to-sim model evaluation and much faster iteration of GENE. All available today. Genesis World 1.0: Quadrants: Nyx:

Genesis AI

308,392 views • 1 month ago

We are excited to share our #CORL2024 paper (oral) on "Learning Quadruped Locomotion Using Differentiable Simulation" done in collaboration with Sangbae Kim Massachusetts Institute of Technology (MIT). We present a new way to learn to walk in minutes without parallelization, outperforming PPO in sample efficiency! PDF: Video: We present a new framework for learning quadruped locomotion. By leveraging differentiable simulation for policy optimization, our approach achieves fast convergence and stable training, significantly outperforming model-free #ReinforcementLearning methods like PPO in sample efficiency. The key enabler is to combine a high-fidelity, non-differentiable simulator for forward dynamics with a simplified surrogate model for gradient backpropagation. Our framework enables learning quadruped walking in simulation in minutes without parallelization. When augmented with GPU parallelization, our approach allows the quadruped robot to master diverse locomotion skills on challenging terrains in minutes. This work highlights one of the first successful real-world applications of differentiable simulation for quadruped robots, offering a compelling alternative to traditional RL methods. Kudos to Yunlong Song! UZH Science University of Zurich UZH Space Hub UZH IfI European Research Council (ERC) Massachusetts Institute of Technology (MIT)MechE

We are excited to share our #CORL2024 paper (oral) on "Learning Quadruped Locomotion Using Differentiable Simulation" done in collaboration with Sangbae Kim Massachusetts Institute of Technology (MIT). We present a new way to learn to walk in minutes without parallelization, outperforming PPO in sample efficiency! PDF: Video: We present a new framework for learning quadruped locomotion. By leveraging differentiable simulation for policy optimization, our approach achieves fast convergence and stable training, significantly outperforming model-free #ReinforcementLearning methods like PPO in sample efficiency. The key enabler is to combine a high-fidelity, non-differentiable simulator for forward dynamics with a simplified surrogate model for gradient backpropagation. Our framework enables learning quadruped walking in simulation in minutes without parallelization. When augmented with GPU parallelization, our approach allows the quadruped robot to master diverse locomotion skills on challenging terrains in minutes. This work highlights one of the first successful real-world applications of differentiable simulation for quadruped robots, offering a compelling alternative to traditional RL methods. Kudos to Yunlong Song! UZH Science University of Zurich UZH Space Hub UZH IfI European Research Council (ERC) Massachusetts Institute of Technology (MIT)MechE

Davide Scaramuzza

15,533 views • 1 year ago

An end-to-end Machine Learning system, starting with a dataset and going all the way to monitoring it in production. We start next Monday! We'll build everything using open-source tools, and I'll show you how to deploy your system on different cloud platforms. I've been running this live course for 1.5+ years now, and about 1,500 students have taken it. It's the best Machine Learning Engineering class I'm aware of by 10 miles. More information is in the attached video. Link to join below. See you next week!

An end-to-end Machine Learning system, starting with a dataset and going all the way to monitoring it in production. We start next Monday! We'll build everything using open-source tools, and I'll show you how to deploy your system on different cloud platforms. I've been running this live course for 1.5+ years now, and about 1,500 students have taken it. It's the best Machine Learning Engineering class I'm aware of by 10 miles. More information is in the attached video. Link to join below. See you next week!

Santiago

36,622 views • 1 year ago

📢 Announcing one of the most exciting works from us this year on **scalable robot policy evaluation through real-to-sim transfer**, moving toward a scalable evaluation engine with structured world models that capture the appearance, geometry, and dynamics of environments involving deformable objects. 🤖 Evaluation remains one of the biggest bottlenecks in building general-purpose robots. Today, robots are still evaluated only in the real world, which is **orders of magnitude slower** than the development of language agents. We propose a new framework where simulation performance **strongly correlates** with the real world (r > 0.9), even for deformable objects. The key difference from existing work lies in the correlation between simulation and reality: if a robot model performs better in the digital world, does it also perform better in the real world? This question has long made people hesitant about simulation-based evaluation — especially for deformable objects. We are changing that. Our pipeline achieves effective real-to-sim transfer, establishing **state-of-the-art correlation** between simulation and reality for deformable object manipulation. It provides a **scalable and reproducible evaluation engine** for robot learning. 🌐

📢 Announcing one of the most exciting works from us this year on scalable robot policy evaluation through real-to-sim transfer, moving toward a scalable evaluation engine with structured world models that capture the appearance, geometry, and dynamics of environments involving deformable objects. 🤖 Evaluation remains one of the biggest bottlenecks in building general-purpose robots. Today, robots are still evaluated only in the real world, which is orders of magnitude slower than the development of language agents. We propose a new framework where simulation performance strongly correlates with the real world (r > 0.9), even for deformable objects. The key difference from existing work lies in the correlation between simulation and reality: if a robot model performs better in the digital world, does it also perform better in the real world? This question has long made people hesitant about simulation-based evaluation — especially for deformable objects. We are changing that. Our pipeline achieves effective real-to-sim transfer, establishing state-of-the-art correlation between simulation and reality for deformable object manipulation. It provides a scalable and reproducible evaluation engine for robot learning. 🌐

Yunzhu Li

39,900 views • 8 months ago

Today we are excited to open up Neuracore to the academic community! Neuracore is a new data foundation built to accelerate robot learning by removing one of the field’s biggest bottlenecks: capturing and working with high-fidelity multimodal robotics data. For the first time, researchers can store, view, and work with robotics data in a cloud-native system built specifically for large-scale learning, and we are making this core platform completely free for academia. The platform lets teams capture every sensor at its native rate, store and visualize data without loss, and then train and deploy models locally using our open-source code (Link in the comments). We are rolling out access to select academic institutions first. Anyone with an academic email can sign up, and if your institution is not part of the initial rollout, you will be able to join the waitlist directly. Beyond providing this infrastructure, we see an opportunity to build a global community where engineers and researchers can share, collaborate, and advance the frontier of robot learning together. Supported by our recent $3M pre-seed round led by Earlybird VC, we are excited to take this mission even further. Our long-term goal is for Neuracore to become the natural home for cutting-edge robot learning algorithms and real-world robotics experimentation, helping accelerate the next wave of Physical AI.

Today we are excited to open up Neuracore to the academic community! Neuracore is a new data foundation built to accelerate robot learning by removing one of the field’s biggest bottlenecks: capturing and working with high-fidelity multimodal robotics data. For the first time, researchers can store, view, and work with robotics data in a cloud-native system built specifically for large-scale learning, and we are making this core platform completely free for academia. The platform lets teams capture every sensor at its native rate, store and visualize data without loss, and then train and deploy models locally using our open-source code (Link in the comments). We are rolling out access to select academic institutions first. Anyone with an academic email can sign up, and if your institution is not part of the initial rollout, you will be able to join the waitlist directly. Beyond providing this infrastructure, we see an opportunity to build a global community where engineers and researchers can share, collaborate, and advance the frontier of robot learning together. Supported by our recent $3M pre-seed round led by Earlybird VC, we are excited to take this mission even further. Our long-term goal is for Neuracore to become the natural home for cutting-edge robot learning algorithms and real-world robotics experimentation, helping accelerate the next wave of Physical AI.

Neuracore

40,620 views • 7 months ago

Exciting updates on Project GR00T! We discover a systematic way to scale up robot data, tackling the most painful pain point in robotics. The idea is simple: human collects demonstration on a real robot, and we multiply that data 1000x or more in simulation. Let’s break it down: 1. We use Apple Vision Pro (yes!!) to give the human operator first person control of the humanoid. Vision Pro parses human hand pose and retargets the motion to the robot hand, all in real time. From the human’s point of view, they are immersed in another body like the Avatar. Teleoperation is slow and time-consuming, but we can afford to collect a small amount of data. 2. We use RoboCasa, a generative simulation framework, to multiply the demonstration data by varying the visual appearance and layout of the environment. In Jensen’s keynote video below, the humanoid is now placing the cup in hundreds of kitchens with a huge diversity of textures, furniture, and object placement. We only have 1 physical kitchen at the GEAR Lab in NVIDIA HQ, but we can conjure up infinite ones in simulation. 3. Finally, we apply MimicGen, a technique to multiply the above data even more by varying the motion of the robot. MimicGen generates vast number of new action trajectories based on the original human data, and filters out failed ones (e.g. those that drop the cup) to form a much larger dataset. To sum up, given 1 human trajectory with Vision Pro -> RoboCasa produces N (varying visuals) -> MimicGen further augments to NxM (varying motions). This is the way to trade compute for expensive human data by GPU-accelerated simulation. A while ago, I mentioned that teleoperation is fundamentally not scalable, because we are always limited by 24 hrs/robot/day in the world of atoms. Our new GR00T synthetic data pipeline breaks this barrier in the world of bits. Scaling has been so much fun for LLMs, and it's finally our turn to have fun in robotics! We are building tools to enable everyone in the ecosystem to scale up with us. Links in thread:

Jim Fan

364,380 views • 2 years ago

Today we’re celebrating 10 years of the Meta FAIR lab in Paris by sharing a collection of new models, datasets and some exciting milestones in the impacts of open source — all laddering up to our ongoing work to achieve Advanced Machine Intelligence (AMI). 1️⃣ Meta PARTNR is a framework for human-robot collaboration that builds on our existing work in this space with a new dataset and a large planning model enabling robots to accomplish complex tasks alongside humans. 2️⃣ Meta Audiobox Aesthetics enables the automatic evaluation of audio aesthetics, providing a comprehensive assessment of audio quality across speech, music and sound. 3️⃣ Open Source Machine Translation Benchmark is a carefully crafted collection with the aim of building an unprecedented multilingual machine translation benchmark for the community. 4️⃣ Two new breakthrough studies using AI to further our understanding of language in the brain.

Today we’re celebrating 10 years of the Meta FAIR lab in Paris by sharing a collection of new models, datasets and some exciting milestones in the impacts of open source — all laddering up to our ongoing work to achieve Advanced Machine Intelligence (AMI). 1️⃣ Meta PARTNR is a framework for human-robot collaboration that builds on our existing work in this space with a new dataset and a large planning model enabling robots to accomplish complex tasks alongside humans. 2️⃣ Meta Audiobox Aesthetics enables the automatic evaluation of audio aesthetics, providing a comprehensive assessment of audio quality across speech, music and sound. 3️⃣ Open Source Machine Translation Benchmark is a carefully crafted collection with the aim of building an unprecedented multilingual machine translation benchmark for the community. 4️⃣ Two new breakthrough studies using AI to further our understanding of language in the brain.

AI at Meta

85,774 views • 1 year ago

Today, we're joined by Nikita Rudin, co-founder and CEO of Flexion to discuss the gap between current robotic capabilities and what’s required to deploy fully autonomous robots in the real world. Nikita explains how reinforcement learning and simulation have driven rapid progress in robot locomotion—and why locomotion is still far from “solved.” We dig into the sim2real gap, and how adding visual inputs introduces noise and significantly complicates sim-to-real transfer. We also explore the debate between end-to-end models and modular approaches, and why separating locomotion, planning, and semantics remains a pragmatic approach today. Nikita also introduces the concept of "real-to-sim", which uses real-world data to refine simulation parameters for higher fidelity training, discusses how reinforcement learning, imitation learning, and teleoperation data are combined to train robust policies for both quadruped and humanoid robots, and introduces Flexion's hierarchical approach that utilizes pre-trained Vision-Language Models (VLMs) for high-level task orchestration with Vision-Language-Action (VLA) models and low-level whole-body trackers. Finally, Nikita shares the behind-the-scenes in humanoid robot demos, his take on reinforcement learning in simulation versus the real world, the nuances of reward tuning, and offers practical advice for researchers and practitioners looking to get started in robotics today. 🗒️ For the full list of resources for this episode, visit the show notes page: 📖 CHAPTERS =============================== 00:00 - Introduction 04:07 - Is robot locomotion solved? 06:04 - Sim-to-real gap 08:58 - Adding semantics to policies 09:42 - Modular vs end-to-end architectures 10:29 - Planner model 12:21 - Adapting RL techniques from quadrupeds to humanoids 15:39 - Behind robot demos 18:09 - Humanoid robots in home environments 22:03 - Training approach 23:56 - VLA models 27:59 - Closing the sim-to-real gap 32:55 - Task orchestration using VLMs 36:38 - Tool use 38:10 - Model hierarchy 43:37 - Simulator versus simulation environment 44:57 - Combining imitation learning and reinforcement learning 46:42 - RL in real world versus RL in simulation 52:58 - Reward tuning and value functions in robotics 56:38 - Predictions 1:00:10 - Humanoids, quadropeds, and wheeled platforms 1:02:45 - Advice, recommended robot kits, and community pla

Today, we're joined by Nikita Rudin, co-founder and CEO of Flexion to discuss the gap between current robotic capabilities and what’s required to deploy fully autonomous robots in the real world. Nikita explains how reinforcement learning and simulation have driven rapid progress in robot locomotion—and why locomotion is still far from “solved.” We dig into the sim2real gap, and how adding visual inputs introduces noise and significantly complicates sim-to-real transfer. We also explore the debate between end-to-end models and modular approaches, and why separating locomotion, planning, and semantics remains a pragmatic approach today. Nikita also introduces the concept of "real-to-sim", which uses real-world data to refine simulation parameters for higher fidelity training, discusses how reinforcement learning, imitation learning, and teleoperation data are combined to train robust policies for both quadruped and humanoid robots, and introduces Flexion's hierarchical approach that utilizes pre-trained Vision-Language Models (VLMs) for high-level task orchestration with Vision-Language-Action (VLA) models and low-level whole-body trackers. Finally, Nikita shares the behind-the-scenes in humanoid robot demos, his take on reinforcement learning in simulation versus the real world, the nuances of reward tuning, and offers practical advice for researchers and practitioners looking to get started in robotics today. 🗒️ For the full list of resources for this episode, visit the show notes page: 📖 CHAPTERS =============================== 00:00 - Introduction 04:07 - Is robot locomotion solved? 06:04 - Sim-to-real gap 08:58 - Adding semantics to policies 09:42 - Modular vs end-to-end architectures 10:29 - Planner model 12:21 - Adapting RL techniques from quadrupeds to humanoids 15:39 - Behind robot demos 18:09 - Humanoid robots in home environments 22:03 - Training approach 23:56 - VLA models 27:59 - Closing the sim-to-real gap 32:55 - Task orchestration using VLMs 36:38 - Tool use 38:10 - Model hierarchy 43:37 - Simulator versus simulation environment 44:57 - Combining imitation learning and reinforcement learning 46:42 - RL in real world versus RL in simulation 52:58 - Reward tuning and value functions in robotics 56:38 - Predictions 1:00:10 - Humanoids, quadropeds, and wheeled platforms 1:02:45 - Advice, recommended robot kits, and community pla

The TWIML AI Podcast

22,533 views • 6 months ago

AI for robotics has some monumental challenges, like high dimensionality and reasoning with the physical world. Ashok Elluswamy, AI lead at Tesla, thinks the hardest problem is Evaluation. - Loss isn't a perfect indicator of Policy Neural Network's quality. - Testing in the real world for all the long-tail failure cases is not possible. To solve the evaluation challenge, Tesla has developed a World Simulator Model. It is trained on state and action pairs, so for a given current camera feed and current action, the model generates the next frame in the cameras. Once the world simulation is good enough, it can be hooked up to the policy network for a closed-loop simulation. In this setup, the model generates the next video frames, and based on those, the policy network comes up with the next action, which feeds back into the world simulation; this continuous loop creates a coherent simulation. The world simulation neural network is not just applicable to self-driving but also to humanoids, because Tesla trains it on common data. The examples show Optimus navigating or manipulating inside the world simulation, with all the pixels being generated by the world simulation model. The robotic agents can be tested in an accurately reproduced edge case and in other potential variations of the same edge case.

AI for robotics has some monumental challenges, like high dimensionality and reasoning with the physical world. Ashok Elluswamy, AI lead at Tesla, thinks the hardest problem is Evaluation. - Loss isn't a perfect indicator of Policy Neural Network's quality. - Testing in the real world for all the long-tail failure cases is not possible. To solve the evaluation challenge, Tesla has developed a World Simulator Model. It is trained on state and action pairs, so for a given current camera feed and current action, the model generates the next frame in the cameras. Once the world simulation is good enough, it can be hooked up to the policy network for a closed-loop simulation. In this setup, the model generates the next video frames, and based on those, the policy network comes up with the next action, which feeds back into the world simulation; this continuous loop creates a coherent simulation. The world simulation neural network is not just applicable to self-driving but also to humanoids, because Tesla trains it on common data. The examples show Optimus navigating or manipulating inside the world simulation, with all the pixels being generated by the world simulation model. The robotic agents can be tested in an accurately reproduced edge case and in other potential variations of the same edge case.

The Humanoid Hub

16,591 views • 5 months ago

Tired of teleoperation? One human video → 1,000s of robot demos. (📍GitHub ) Scaling Robot Data Without Dynamics Simulation or Robot Hardware Real2Render2Real (R2R2R) is a new way to scale robot data without physics simulation or hardware. You take a phone scan + a single monocular human demo. It tracks the motion, renders photorealistic scenes, and generates diverse, robot-agnostic trajectories ready for training. > No teleop, no sim, no robot, just a phone and a video > Train VLA models and diffusion policies directly on the output > Supports multiple robot embodiments with kinematic consistency > 1000s of demos in 1/27 the time of real-world collection Thank you, Max Fu, for sharing!! Project: Paper: Code coming soon: It shows that with the right pipeline, you can scale robot learning data without touching a robot. One of the most interesting directions in scalable robotics today. —— Weekly robotics and AI insights. Subscribe free:

Tired of teleoperation? One human video → 1,000s of robot demos. (📍GitHub ) Scaling Robot Data Without Dynamics Simulation or Robot Hardware Real2Render2Real (R2R2R) is a new way to scale robot data without physics simulation or hardware. You take a phone scan + a single monocular human demo. It tracks the motion, renders photorealistic scenes, and generates diverse, robot-agnostic trajectories ready for training. > No teleop, no sim, no robot, just a phone and a video > Train VLA models and diffusion policies directly on the output > Supports multiple robot embodiments with kinematic consistency > 1000s of demos in 1/27 the time of real-world collection Thank you, Max Fu, for sharing!! Project: Paper: Code coming soon: It shows that with the right pipeline, you can scale robot learning data without touching a robot. One of the most interesting directions in scalable robotics today. —— Weekly robotics and AI insights. Subscribe free:

Ilir Aliu

42,804 views • 5 months ago