Загрузка видео...

Не удалось загрузить видео

На главную

Robotics has a data scarcity problem - you simply can't scrape robot control data from webpages. Introducing GR00T-Mimic and GR00T-Gen: using both Graphics 1.0 & Graphics 2.0 to multiply your robot datasets by 1,000,000x. We trade compute for synthetic data, so we are not capped by the fundamental physical...

163,670 просмотров • 1 год назад •via X (Twitter)

Комментарии: 11

Фото профиля Jim Fan
Jim Fan1 год назад

We are hiring full-time researchers, engineers, and summer research interns!! If you want to get in touch and show us your past works, email [email protected] (me) and [email protected] (@yukez). Prefix your email subject with the literal "[NV GEAR APPLICATION]". Please forward these info to any friends you know who might be interested! - Sr. RE, Robot Systems - Sr. RE, Reinforcement Learning - Sr. RE, Foundation Model Training Infras - Sr. RE, Simulation - Sr. RE, ML Data Pipelines - Research Scientist - Research Intern

Фото профиля The Rundown AI
The Rundown AI2 лет назад

AI won't replace you, but a person using AI will. Join 500,000+ readers and learn how to use AI in just 5 minutes a day (for free).

Фото профиля Arnav Jaitly
Arnav Jaitly1 год назад

It’s cute how you found a way to name them Groot

Фото профиля Abhinav Girdhar
Abhinav Girdhar1 год назад

Revolutionizing robotics data with GR00T-Mimic & GR00T-Gen—millions of synthetic points, smarter machines, and Moravec's paradox crushed!

Фото профиля The AI Veteran
The AI Veteran1 год назад

World sim and synthetic robot data generation in one day, and they already work together. This will help close the gap with Chinese robotics. Well done!

Фото профиля The Canaanite
The Canaanite1 год назад

Incredible! Super smart synthetic data generation pipeline.

Фото профиля Elizabeth Greene
Elizabeth Greene1 год назад

I don't understand why so many robots walk like they're doing the "gotta-poop-butt-clenched-shuffle." Is that a super-stable gait or somesuch?

Фото профиля Lukas Ziegler
Lukas Ziegler1 год назад

huuuuuge

Фото профиля Prashant Rai
Prashant Rai1 год назад

Who said robot learning is hitting the wall? It’s not just climbing over but smashing through!🙀

Фото профиля jovin
jovin1 год назад

This is an awesome explanation on why GR00T is so great! You all are doing amazing work at Nvidia! Definitely changing the world one token at a time 💪

Фото профиля J⏩
J⏩1 год назад

Please bring the background music down about 80% on future videos (or more ideally remove it entirely), it's competing with the message on my crappy (and typical) pc speaker.

Похожие видео

Exciting updates on Project GR00T! We discover a systematic way to scale up robot data, tackling the most painful pain point in robotics. The idea is simple: human collects demonstration on a real robot, and we multiply that data 1000x or more in simulation. Let’s break it down: 1. We use Apple Vision Pro (yes!!) to give the human operator first person control of the humanoid. Vision Pro parses human hand pose and retargets the motion to the robot hand, all in real time. From the human’s point of view, they are immersed in another body like the Avatar. Teleoperation is slow and time-consuming, but we can afford to collect a small amount of data. 2. We use RoboCasa, a generative simulation framework, to multiply the demonstration data by varying the visual appearance and layout of the environment. In Jensen’s keynote video below, the humanoid is now placing the cup in hundreds of kitchens with a huge diversity of textures, furniture, and object placement. We only have 1 physical kitchen at the GEAR Lab in NVIDIA HQ, but we can conjure up infinite ones in simulation. 3. Finally, we apply MimicGen, a technique to multiply the above data even more by varying the *motion* of the robot. MimicGen generates vast number of new action trajectories based on the original human data, and filters out failed ones (e.g. those that drop the cup) to form a much larger dataset. To sum up, given 1 human trajectory with Vision Pro -> RoboCasa produces N (varying visuals) -> MimicGen further augments to NxM (varying motions). This is the way to trade compute for expensive human data by GPU-accelerated simulation. A while ago, I mentioned that teleoperation is fundamentally not scalable, because we are always limited by 24 hrs/robot/day in the world of atoms. Our new GR00T synthetic data pipeline breaks this barrier in the world of bits. Scaling has been so much fun for LLMs, and it's finally our turn to have fun in robotics! We are building tools to enable everyone in the ecosystem to scale up with us. Links in thread:

Jim Fan

364,284 просмотров • 1 год назад

Excited to announce GR00T N1, the world’s first open foundation model for humanoid robots! We are on a mission to democratize Physical AI. The power of general robot brain, in the palm of your hand - with only 2B parameters, N1 learns from the most diverse physical action dataset ever compiled and punches above its weight: - Real humanoid teleoperation data. - Large-scale simulation data: we are open-sourcing 300K+ trajectories! - Neural trajectories: we apply SOTA video generation models to “hallucinate” new synthetic data that features accurate physics in pixels. Using Jensen’s words, “systematically infinite data”! - Latent actions: we develop novel algorithms to extract action tokens from in-the-wild human videos and neural generated videos. GR00T N1 is a single end-to-end neural net, from photons to actions: - Vision-Language Model (System 2) that interprets the physical world through vision and language instructions, enabling robots to reason about their environment and instructions, and plan the right actions. - Diffusion Transformer (System 1) that “renders” smooth and precise motor actions at 120 Hz, executing the latent plan made by System 2. We deploy N1 on GR1 robot, 1X Neo robot, and a large collection of simulation benchmarks. N1 achieves up to +30% boost in diverse manipulation tasks for household and industrial settings. While humanoid robots are the main focus of N1, our model also supports cross-embodiment. We finetune it to work on the $110 HuggingFace LeRobot SO100 robot arm! Open robot brain runs on open hardware. Sounds just right. Let’s solve robotics, together, one token at a time. Links to our Whitepaper, Github repo, HuggingFace model, and open dataset page in the thread: 🧵

Jim Fan

465,670 просмотров • 1 год назад

Synthetic data will provide the next trillion tokens to fuel our hungry models. I'm excited to announce MimicGen: massively scaling up data pipeline for robot learning! We multiply high-quality human data in simulation with digital twins. Using 50,000 training episodes across 18 tasks, multiple simulators, and even in the real-world! The idea is simple: 1. Humans tele-operate the robot to complete a task. It is extremely high-quality but also very slow and expensive. 2. We create a digital twin of the robot and the scene in high-fidelity, GPU-accelerated simulation. 3. We can now move objects around, replace with new assets, and even change the robot hand - basically augment the training data with procedural generation. 4. Export the successful episodes, and feed that to a neural network! You now have an near-infinite stream of data. One of the key reasons that robotics lags far behind other AI fields is the lack of data: you cannot scrape control signals from the internet. They simply don't exist in-the-wild. MimicGen shows the power of synthetic data and simulation to keep our scaling laws alive. I believe this principle apply beyond robotics. We are quickly exhausting the high-quality, real tokens from the web. Artificial intelligence from artificial data will be the way forward. We are big fans of the OSS community. As usual, we open-source everything, including the generated dataset! - Website: - Paper: - Dataset is hosted on HuggingFace (thanks AK!!): - Code: MimicGen is led by Ajay Mandlekar, deep dive in the thread:

Jim Fan

332,164 просмотров • 2 лет назад

I don’t know if we live in a Matrix, but I know for sure that robots will spend most of their lives in simulation. Let machines train machines. I’m excited to introduce DexMimicGen, a massive-scale synthetic data generator that enables a humanoid robot to learn complex skills from only a handful of human demonstrations. Yes, as few as 5! DexMimicGen addresses the biggest pain point in robotics: where do we get data? Unlike with LLMs, where vast amounts of texts are readily available, you cannot simply download motor control signals from the internet. So researchers teleoperate the robots to collect motion data via XR headsets. They have to repeat the same skill over and over and over again, because neural nets are data hungry. This is a very slow and uncomfortable process. At NVIDIA, we believe the majority of high-quality tokens for robot foundation models will come from simulation. What DexMimicGen does is to trade GPU compute time for human time. It takes one motion trajectory from human, and multiplies into 1000s of new trajectories. A robot brain trained on this augmented dataset will generalize far better in the real world. Think of DexMimicGen as a learning signal amplifier. It maps a small dataset to a large (de facto infinite) dataset, using physics simulation in the loop. In this way, we free humans from babysitting the bots all day. The future of robot data is generative. The future of the entire robot learning pipeline will also be generative. 🧵

Jim Fan

165,215 просмотров • 1 год назад

We trained a humanoid with 22-DoF dexterous hands to assemble model cars, operate syringes, sort poker cards, fold/roll shirts, all learned primarily from 20,000+ hours of egocentric human video with no robot in the loop. Humans are the most scalable embodiment on the planet. We discovered a near-perfect log-linear scaling law (R² = 0.998) between human video volume and action prediction loss, and this loss directly predicts real-robot success rate. Humanoid robots will be the end game, because they are the practical form factor with minimal embodiment gap from humans. Call it the Bitter Lesson of robot hardware: the kinematic similarity lets us simply retarget human finger motion onto dexterous robot hand joints. No learned embeddings, no fancy transfer algorithms needed. Relative wrist motion + retargeted 22-DoF finger actions serve as a unified action space that carries through from pre-training to robot execution. Our recipe is called "EgoScale": - Pre-train GR00T N1.5 on 20K hours of human video, mid-train with only 4 hours (!) of robot play data with Sharpa hands. 54% gains over training from scratch across 5 highly dexterous tasks. - Most surprising result: a *single* teleop demo is sufficient to learn a never-before-seen task. Our recipe enables extreme data efficiency. - Although we pre-train in 22-DoF hand joint space, the policy transfers to a Unitree G1 with 7-DoF tri-finger hands. 30%+ gains over training on G1 data alone. The scalable path to robot dexterity was never more robots. It was always us. Deep dives in thread:

Jim Fan

291,259 просмотров • 3 месяцев назад