Загрузка видео...

Не удалось загрузить видео

Возникла проблема при загрузке этого видео. Это может быть связано с временными проблемами сети или видео может быть недоступно.

На главную

Robotics has a data scarcity problem - you simply can't scrape robot control data from webpages. Introducing GR00T-Mimic and GR00T-Gen: using both Graphics 1.0 & Graphics 2.0 to multiply your robot datasets by 1,000,000x. We trade compute for synthetic data, so we are not capped by the fundamental physical... limit of 24 hrs/robot/day. Robotics is right in the thick of Moravec's paradox: things that are easy for humans turn out to be incredibly hard for machines. We are crushing the Moravec's paradox, one token at a time. > Graphics 1.0: Isaac simulators with manually written, GPU-accelerated physics and rendering equations. > Graphics 2.0: big neural nets (Cosmos) that repaint the pixels from sim textures to real, given an open-ended prompt. Robot data multiplier workflow: 1. GR00T-Teleop: use XR device like Apple Vision Pro to map human finger poses to humanoid hands. 2. GR00T-Mimic: given a human-collected task demonstration, we augment the actions in Isaac and filter out ones that fail the task. 3. GR00T-Gen: apply Graphics 1.0 and then Graphics 2.0 to produce tons of visual variations. Join us to build Physical AI together! Job application links in thread:show more

Jim Fan

352,675 subscribers

163,670 просмотров • 1 год назад •via X (Twitter)

Наука и технологии Новости и политика

Anya Rossi• Live Now

Private livecam show

Комментарии: 11

Фото профиля Jim Fan

Jim Fan1 год назад

We are hiring full-time researchers, engineers, and summer research interns!! If you want to get in touch and show us your past works, email [email protected] (me) and [email protected] (@yukez). Prefix your email subject with the literal "[NV GEAR APPLICATION]". Please forward these info to any friends you know who might be interested! - Sr. RE, Robot Systems - Sr. RE, Reinforcement Learning - Sr. RE, Foundation Model Training Infras - Sr. RE, Simulation - Sr. RE, ML Data Pipelines - Research Scientist - Research Intern

Фото профиля The Rundown AI

The Rundown AI2 лет назад

AI won't replace you, but a person using AI will. Join 500,000+ readers and learn how to use AI in just 5 minutes a day (for free).

Фото профиля Arnav Jaitly

Arnav Jaitly1 год назад

It’s cute how you found a way to name them Groot

Фото профиля Abhinav Girdhar

Abhinav Girdhar1 год назад

Revolutionizing robotics data with GR00T-Mimic & GR00T-Gen—millions of synthetic points, smarter machines, and Moravec's paradox crushed!

Фото профиля The AI Veteran

The AI Veteran1 год назад

World sim and synthetic robot data generation in one day, and they already work together. This will help close the gap with Chinese robotics. Well done!

Фото профиля The Canaanite

The Canaanite1 год назад

Incredible! Super smart synthetic data generation pipeline.

Фото профиля Elizabeth Greene

Elizabeth Greene1 год назад

I don't understand why so many robots walk like they're doing the "gotta-poop-butt-clenched-shuffle." Is that a super-stable gait or somesuch?

Фото профиля Lukas Ziegler

Lukas Ziegler1 год назад

huuuuuge

Фото профиля Prashant Rai

Prashant Rai1 год назад

Who said robot learning is hitting the wall? It’s not just climbing over but smashing through!🙀

Фото профиля jovin

jovin1 год назад

This is an awesome explanation on why GR00T is so great! You all are doing amazing work at Nvidia! Definitely changing the world one token at a time 💪

Фото профиля J⏩

J⏩1 год назад

Please bring the background music down about 80% on future videos (or more ideally remove it entirely), it's competing with the message on my crappy (and typical) pc speaker.

Похожие видео

Exciting updates on Project GR00T! We discover a systematic way to scale up robot data, tackling the most painful pain point in robotics. The idea is simple: human collects demonstration on a real robot, and we multiply that data 1000x or more in simulation. Let’s break it down: 1. We use Apple Vision Pro (yes!!) to give the human operator first person control of the humanoid. Vision Pro parses human hand pose and retargets the motion to the robot hand, all in real time. From the human’s point of view, they are immersed in another body like the Avatar. Teleoperation is slow and time-consuming, but we can afford to collect a small amount of data. 2. We use RoboCasa, a generative simulation framework, to multiply the demonstration data by varying the visual appearance and layout of the environment. In Jensen’s keynote video below, the humanoid is now placing the cup in hundreds of kitchens with a huge diversity of textures, furniture, and object placement. We only have 1 physical kitchen at the GEAR Lab in NVIDIA HQ, but we can conjure up infinite ones in simulation. 3. Finally, we apply MimicGen, a technique to multiply the above data even more by varying the motion of the robot. MimicGen generates vast number of new action trajectories based on the original human data, and filters out failed ones (e.g. those that drop the cup) to form a much larger dataset. To sum up, given 1 human trajectory with Vision Pro -> RoboCasa produces N (varying visuals) -> MimicGen further augments to NxM (varying motions). This is the way to trade compute for expensive human data by GPU-accelerated simulation. A while ago, I mentioned that teleoperation is fundamentally not scalable, because we are always limited by 24 hrs/robot/day in the world of atoms. Our new GR00T synthetic data pipeline breaks this barrier in the world of bits. Scaling has been so much fun for LLMs, and it's finally our turn to have fun in robotics! We are building tools to enable everyone in the ecosystem to scale up with us. Links in thread:

Jim Fan

364,380 просмотров • 2 лет назад

Physics AI 🌊 Jensen just unveiled the NVIDIA Isaac GR00T Reference Humanoid Robot at GTC Taipei--the world’s first open-source humanoid robot reference design Built on Jetson Thor edge computing and the Isaac GR00T open platform, it features a Unitree H2 Plus body with Sharpa five-finger dexterous hands and 75 degrees of freedom. Even before Physical AI truly explodes, NVIDIA has built a complete end-to-end full-stack infrastructure: >chips (Jetson Thor + Blackwell), >synthetic data (GR00T-Dreams), >world models (Cosmos), >foundation models (GR00T N-series VLA), >high-fidelity simulation platforms (Isaac Sim + Isaac Lab + Omniverse). Dozens of humanoid robot companies ,including Agility Robotics, Aglie robotics,Boston Dynamics, Figure, 1X, NEURA Robotics, XPENG Robotics, and more are already running large-scale daily simulation training on this platform, rapidly iterating perception, reasoning, and actions in virtual environments, speeding up the jump from lab prototypes to real factory deployment. By providing an open reference design and full-stack tools, NVIDIA has dramatically lowered the barrier, letting every robotics team join the Physical AI wave with low cost and high efficiency.

Physics AI 🌊 Jensen just unveiled the NVIDIA Isaac GR00T Reference Humanoid Robot at GTC Taipei--the world’s first open-source humanoid robot reference design Built on Jetson Thor edge computing and the Isaac GR00T open platform, it features a Unitree H2 Plus body with Sharpa five-finger dexterous hands and 75 degrees of freedom. Even before Physical AI truly explodes, NVIDIA has built a complete end-to-end full-stack infrastructure: >chips (Jetson Thor + Blackwell), >synthetic data (GR00T-Dreams), >world models (Cosmos), >foundation models (GR00T N-series VLA), >high-fidelity simulation platforms (Isaac Sim + Isaac Lab + Omniverse). Dozens of humanoid robot companies ,including Agility Robotics, Aglie robotics,Boston Dynamics, Figure, 1X, NEURA Robotics, XPENG Robotics, and more are already running large-scale daily simulation training on this platform, rapidly iterating perception, reasoning, and actions in virtual environments, speeding up the jump from lab prototypes to real factory deployment. By providing an open reference design and full-stack tools, NVIDIA has dramatically lowered the barrier, letting every robotics team join the Physical AI wave with low cost and high efficiency.

CyberRobo

17,092 просмотров • 1 месяц назад

Excited to announce GR00T N1, the world’s first open foundation model for humanoid robots! We are on a mission to democratize Physical AI. The power of general robot brain, in the palm of your hand - with only 2B parameters, N1 learns from the most diverse physical action dataset ever compiled and punches above its weight: - Real humanoid teleoperation data. - Large-scale simulation data: we are open-sourcing 300K+ trajectories! - Neural trajectories: we apply SOTA video generation models to “hallucinate” new synthetic data that features accurate physics in pixels. Using Jensen’s words, “systematically infinite data”! - Latent actions: we develop novel algorithms to extract action tokens from in-the-wild human videos and neural generated videos. GR00T N1 is a single end-to-end neural net, from photons to actions: - Vision-Language Model (System 2) that interprets the physical world through vision and language instructions, enabling robots to reason about their environment and instructions, and plan the right actions. - Diffusion Transformer (System 1) that “renders” smooth and precise motor actions at 120 Hz, executing the latent plan made by System 2. We deploy N1 on GR1 robot, 1X Neo robot, and a large collection of simulation benchmarks. N1 achieves up to +30% boost in diverse manipulation tasks for household and industrial settings. While humanoid robots are the main focus of N1, our model also supports cross-embodiment. We finetune it to work on the $110 HuggingFace LeRobot SO100 robot arm! Open robot brain runs on open hardware. Sounds just right. Let’s solve robotics, together, one token at a time. Links to our Whitepaper, Github repo, HuggingFace model, and open dataset page in the thread: 🧵

Excited to announce GR00T N1, the world’s first open foundation model for humanoid robots! We are on a mission to democratize Physical AI. The power of general robot brain, in the palm of your hand - with only 2B parameters, N1 learns from the most diverse physical action dataset ever compiled and punches above its weight: - Real humanoid teleoperation data. - Large-scale simulation data: we are open-sourcing 300K+ trajectories! - Neural trajectories: we apply SOTA video generation models to “hallucinate” new synthetic data that features accurate physics in pixels. Using Jensen’s words, “systematically infinite data”! - Latent actions: we develop novel algorithms to extract action tokens from in-the-wild human videos and neural generated videos. GR00T N1 is a single end-to-end neural net, from photons to actions: - Vision-Language Model (System 2) that interprets the physical world through vision and language instructions, enabling robots to reason about their environment and instructions, and plan the right actions. - Diffusion Transformer (System 1) that “renders” smooth and precise motor actions at 120 Hz, executing the latent plan made by System 2. We deploy N1 on GR1 robot, 1X Neo robot, and a large collection of simulation benchmarks. N1 achieves up to +30% boost in diverse manipulation tasks for household and industrial settings. While humanoid robots are the main focus of N1, our model also supports cross-embodiment. We finetune it to work on the $110 HuggingFace LeRobot SO100 robot arm! Open robot brain runs on open hardware. Sounds just right. Let’s solve robotics, together, one token at a time. Links to our Whitepaper, Github repo, HuggingFace model, and open dataset page in the thread: 🧵

Jim Fan

466,261 просмотров • 1 год назад

Today is the beginning of our moonshot to solve embodied AGI in the physical world. I’m so excited to announce Project GR00T, our new initiative to create a general-purpose foundation model for humanoid robot learning. The GR00T model will enable a robot to understand multimodal instructions, such as language, video, and demonstration, and perform a variety of useful tasks. We are collaborating with many leading humanoid companies around the world, so that GR00T may transfer across embodiments and help the ecosystem thrive. GR00T is born on NVIDIA’s deep technology stack. We simulate in Isaac Lab (new app on Omniverse Isaac Sim for humanoid learning), train on OSMO (new compute orchestration system to scale up models), and deploy to Jetson Thor (new edge GPU chip designed to power GR00T). Announced in Jensen's keynote, Project GR00T is a cornerstone for the “Foundation Agent” roadmap of the newly founded GEAR Lab. At GEAR, we are building generally capable agents that learn to act skillfully in many worlds, virtual and real. See if you can spot "GEAR" in the video ;) Join us on the journey to land on the moon.

Today is the beginning of our moonshot to solve embodied AGI in the physical world. I’m so excited to announce Project GR00T, our new initiative to create a general-purpose foundation model for humanoid robot learning. The GR00T model will enable a robot to understand multimodal instructions, such as language, video, and demonstration, and perform a variety of useful tasks. We are collaborating with many leading humanoid companies around the world, so that GR00T may transfer across embodiments and help the ecosystem thrive. GR00T is born on NVIDIA’s deep technology stack. We simulate in Isaac Lab (new app on Omniverse Isaac Sim for humanoid learning), train on OSMO (new compute orchestration system to scale up models), and deploy to Jetson Thor (new edge GPU chip designed to power GR00T). Announced in Jensen's keynote, Project GR00T is a cornerstone for the “Foundation Agent” roadmap of the newly founded GEAR Lab. At GEAR, we are building generally capable agents that learn to act skillfully in many worlds, virtual and real. See if you can spot "GEAR" in the video ;) Join us on the journey to land on the moon.

Jim Fan

1,076,963 просмотров • 2 лет назад

Synthetic data will provide the next trillion tokens to fuel our hungry models. I'm excited to announce MimicGen: massively scaling up data pipeline for robot learning! We multiply high-quality human data in simulation with digital twins. Using 50,000 training episodes across 18 tasks, multiple simulators, and even in the real-world! The idea is simple: 1. Humans tele-operate the robot to complete a task. It is extremely high-quality but also very slow and expensive. 2. We create a digital twin of the robot and the scene in high-fidelity, GPU-accelerated simulation. 3. We can now move objects around, replace with new assets, and even change the robot hand - basically augment the training data with procedural generation. 4. Export the successful episodes, and feed that to a neural network! You now have an near-infinite stream of data. One of the key reasons that robotics lags far behind other AI fields is the lack of data: you cannot scrape control signals from the internet. They simply don't exist in-the-wild. MimicGen shows the power of synthetic data and simulation to keep our scaling laws alive. I believe this principle apply beyond robotics. We are quickly exhausting the high-quality, real tokens from the web. Artificial intelligence from artificial data will be the way forward. We are big fans of the OSS community. As usual, we open-source everything, including the generated dataset! - Website: - Paper: - Dataset is hosted on HuggingFace (thanks AK!!): - Code: MimicGen is led by Ajay Mandlekar, deep dive in the thread:

Synthetic data will provide the next trillion tokens to fuel our hungry models. I'm excited to announce MimicGen: massively scaling up data pipeline for robot learning! We multiply high-quality human data in simulation with digital twins. Using 50,000 training episodes across 18 tasks, multiple simulators, and even in the real-world! The idea is simple: 1. Humans tele-operate the robot to complete a task. It is extremely high-quality but also very slow and expensive. 2. We create a digital twin of the robot and the scene in high-fidelity, GPU-accelerated simulation. 3. We can now move objects around, replace with new assets, and even change the robot hand - basically augment the training data with procedural generation. 4. Export the successful episodes, and feed that to a neural network! You now have an near-infinite stream of data. One of the key reasons that robotics lags far behind other AI fields is the lack of data: you cannot scrape control signals from the internet. They simply don't exist in-the-wild. MimicGen shows the power of synthetic data and simulation to keep our scaling laws alive. I believe this principle apply beyond robotics. We are quickly exhausting the high-quality, real tokens from the web. Artificial intelligence from artificial data will be the way forward. We are big fans of the OSS community. As usual, we open-source everything, including the generated dataset! - Website: - Paper: - Dataset is hosted on HuggingFace (thanks AK!!): - Code: MimicGen is led by Ajay Mandlekar, deep dive in the thread:

Jim Fan

332,199 просмотров • 2 лет назад

I don’t know if we live in a Matrix, but I know for sure that robots will spend most of their lives in simulation. Let machines train machines. I’m excited to introduce DexMimicGen, a massive-scale synthetic data generator that enables a humanoid robot to learn complex skills from only a handful of human demonstrations. Yes, as few as 5! DexMimicGen addresses the biggest pain point in robotics: where do we get data? Unlike with LLMs, where vast amounts of texts are readily available, you cannot simply download motor control signals from the internet. So researchers teleoperate the robots to collect motion data via XR headsets. They have to repeat the same skill over and over and over again, because neural nets are data hungry. This is a very slow and uncomfortable process. At NVIDIA, we believe the majority of high-quality tokens for robot foundation models will come from simulation. What DexMimicGen does is to trade GPU compute time for human time. It takes one motion trajectory from human, and multiplies into 1000s of new trajectories. A robot brain trained on this augmented dataset will generalize far better in the real world. Think of DexMimicGen as a learning signal amplifier. It maps a small dataset to a large (de facto infinite) dataset, using physics simulation in the loop. In this way, we free humans from babysitting the bots all day. The future of robot data is generative. The future of the entire robot learning pipeline will also be generative. 🧵

I don’t know if we live in a Matrix, but I know for sure that robots will spend most of their lives in simulation. Let machines train machines. I’m excited to introduce DexMimicGen, a massive-scale synthetic data generator that enables a humanoid robot to learn complex skills from only a handful of human demonstrations. Yes, as few as 5! DexMimicGen addresses the biggest pain point in robotics: where do we get data? Unlike with LLMs, where vast amounts of texts are readily available, you cannot simply download motor control signals from the internet. So researchers teleoperate the robots to collect motion data via XR headsets. They have to repeat the same skill over and over and over again, because neural nets are data hungry. This is a very slow and uncomfortable process. At NVIDIA, we believe the majority of high-quality tokens for robot foundation models will come from simulation. What DexMimicGen does is to trade GPU compute time for human time. It takes one motion trajectory from human, and multiplies into 1000s of new trajectories. A robot brain trained on this augmented dataset will generalize far better in the real world. Think of DexMimicGen as a learning signal amplifier. It maps a small dataset to a large (de facto infinite) dataset, using physics simulation in the loop. In this way, we free humans from babysitting the bots all day. The future of robot data is generative. The future of the entire robot learning pipeline will also be generative. 🧵

Jim Fan

165,246 просмотров • 1 год назад

Modern AI is confined to the digital world. At Skild AI, we are building towards AGI for the real world, unconstrained by robot type or task — a single, omni-bodied brain. Today, we are sharing our journey, starting with early milestones, with more to come in the weeks ahead. Our Mission: Artificial General Intelligence grounded in the physical world. We believe AGI that can truly understand and reason in the real world can only be built through grounding in the physical world. Our Vision: Any robot, Any task, One brain. We tackle robotics in its full generality – building a continually improving, omni-bodied brain that can control any hardware for any task. Who are we? A passionate group of scientists & engineers driven by our shared vision. We have been researching AI and robotics for more than a decade. Our team includes pioneers of self-supervised learning, curiosity-driven exploration, end-to-end sim2real for visual locomotion, dexterous manipulation, learning from human videos, robot parkour, and many more. Many of these works have won awards at top-tier AI and Robotics conferences. Our team has also built production-ready systems at Anduril, Tesla, Nvidia, Meta, Kitty Hawk, Google, Everyday Robotics, and Amazon. Join us in our mission to build the robot brains of tomorrow.

Modern AI is confined to the digital world. At Skild AI, we are building towards AGI for the real world, unconstrained by robot type or task — a single, omni-bodied brain. Today, we are sharing our journey, starting with early milestones, with more to come in the weeks ahead. Our Mission: Artificial General Intelligence grounded in the physical world. We believe AGI that can truly understand and reason in the real world can only be built through grounding in the physical world. Our Vision: Any robot, Any task, One brain. We tackle robotics in its full generality – building a continually improving, omni-bodied brain that can control any hardware for any task. Who are we? A passionate group of scientists & engineers driven by our shared vision. We have been researching AI and robotics for more than a decade. Our team includes pioneers of self-supervised learning, curiosity-driven exploration, end-to-end sim2real for visual locomotion, dexterous manipulation, learning from human videos, robot parkour, and many more. Many of these works have won awards at top-tier AI and Robotics conferences. Our team has also built production-ready systems at Anduril, Tesla, Nvidia, Meta, Kitty Hawk, Google, Everyday Robotics, and Amazon. Join us in our mission to build the robot brains of tomorrow.

Skild AI

382,876 просмотров • 1 год назад

We might be solving the wrong problem in robotics. That’s what this makes clear. UMI → Universal Manipulation Interface A simple $400 gripper that lets you teach robots by demonstration. You hold it like a tool. Show the task. The robot learns. No teleoperation. No expensive hardware. No robot-specific data. Stanford open-sourced everything → hardware, code, datasets. What stands out to me is the bottleneck. Not algorithms. Data. Teleoperation → ~35 demos/hour UMI → ~111 demos/hour And the data transfers across robots → UR5, Franka, others. The design is surprisingly practical: → GoPro fisheye lens (155° FOV) + mirrors for depth → SLAM + IMU for precise 6DoF tracking → latency matching for dynamic tasks → diffusion policies for multimodal actions Then it scales. Cheng Chi takes this further with Sunday Robotics (with Tony Zhao). A $200 glove → deployed in 500+ homes → ~10 million real-world interactions. Not lab data. Real human behavior. Their robot learns dishes, laundry, espresso → with zero robot-specific data. This is where the shift becomes obvious. From training robots in controlled environments → to learning directly from humans at scale So here’s the real question: Will robotics be unlocked by better models… or by unlocking data? #ArtificialIntelligence #Robotics #AI #Innovation #FutureOfWork

We might be solving the wrong problem in robotics. That’s what this makes clear. UMI → Universal Manipulation Interface A simple $400 gripper that lets you teach robots by demonstration. You hold it like a tool. Show the task. The robot learns. No teleoperation. No expensive hardware. No robot-specific data. Stanford open-sourced everything → hardware, code, datasets. What stands out to me is the bottleneck. Not algorithms. Data. Teleoperation → ~35 demos/hour UMI → ~111 demos/hour And the data transfers across robots → UR5, Franka, others. The design is surprisingly practical: → GoPro fisheye lens (155° FOV) + mirrors for depth → SLAM + IMU for precise 6DoF tracking → latency matching for dynamic tasks → diffusion policies for multimodal actions Then it scales. Cheng Chi takes this further with Sunday Robotics (with Tony Zhao). A $200 glove → deployed in 500+ homes → ~10 million real-world interactions. Not lab data. Real human behavior. Their robot learns dishes, laundry, espresso → with zero robot-specific data. This is where the shift becomes obvious. From training robots in controlled environments → to learning directly from humans at scale So here’s the real question: Will robotics be unlocked by better models… or by unlocking data? #ArtificialIntelligence #Robotics #AI #Innovation #FutureOfWork

Pascal Bornet

186,223 просмотров • 3 месяцев назад

We trained a humanoid with 22-DoF dexterous hands to assemble model cars, operate syringes, sort poker cards, fold/roll shirts, all learned primarily from 20,000+ hours of egocentric human video with no robot in the loop. Humans are the most scalable embodiment on the planet. We discovered a near-perfect log-linear scaling law (R² = 0.998) between human video volume and action prediction loss, and this loss directly predicts real-robot success rate. Humanoid robots will be the end game, because they are the practical form factor with minimal embodiment gap from humans. Call it the Bitter Lesson of robot hardware: the kinematic similarity lets us simply retarget human finger motion onto dexterous robot hand joints. No learned embeddings, no fancy transfer algorithms needed. Relative wrist motion + retargeted 22-DoF finger actions serve as a unified action space that carries through from pre-training to robot execution. Our recipe is called "EgoScale": - Pre-train GR00T N1.5 on 20K hours of human video, mid-train with only 4 hours (!) of robot play data with Sharpa hands. 54% gains over training from scratch across 5 highly dexterous tasks. - Most surprising result: a *single* teleop demo is sufficient to learn a never-before-seen task. Our recipe enables extreme data efficiency. - Although we pre-train in 22-DoF hand joint space, the policy transfers to a Unitree G1 with 7-DoF tri-finger hands. 30%+ gains over training on G1 data alone. The scalable path to robot dexterity was never more robots. It was always us. Deep dives in thread:

We trained a humanoid with 22-DoF dexterous hands to assemble model cars, operate syringes, sort poker cards, fold/roll shirts, all learned primarily from 20,000+ hours of egocentric human video with no robot in the loop. Humans are the most scalable embodiment on the planet. We discovered a near-perfect log-linear scaling law (R² = 0.998) between human video volume and action prediction loss, and this loss directly predicts real-robot success rate. Humanoid robots will be the end game, because they are the practical form factor with minimal embodiment gap from humans. Call it the Bitter Lesson of robot hardware: the kinematic similarity lets us simply retarget human finger motion onto dexterous robot hand joints. No learned embeddings, no fancy transfer algorithms needed. Relative wrist motion + retargeted 22-DoF finger actions serve as a unified action space that carries through from pre-training to robot execution. Our recipe is called "EgoScale": - Pre-train GR00T N1.5 on 20K hours of human video, mid-train with only 4 hours (!) of robot play data with Sharpa hands. 54% gains over training from scratch across 5 highly dexterous tasks. - Most surprising result: a single teleop demo is sufficient to learn a never-before-seen task. Our recipe enables extreme data efficiency. - Although we pre-train in 22-DoF hand joint space, the policy transfers to a Unitree G1 with 7-DoF tri-finger hands. 30%+ gains over training on G1 data alone. The scalable path to robot dexterity was never more robots. It was always us. Deep dives in thread:

Jim Fan

293,585 просмотров • 5 месяцев назад

Today we are excited to open up Neuracore to the academic community! Neuracore is a new data foundation built to accelerate robot learning by removing one of the field’s biggest bottlenecks: capturing and working with high-fidelity multimodal robotics data. For the first time, researchers can store, view, and work with robotics data in a cloud-native system built specifically for large-scale learning, and we are making this core platform completely free for academia. The platform lets teams capture every sensor at its native rate, store and visualize data without loss, and then train and deploy models locally using our open-source code (Link in the comments). We are rolling out access to select academic institutions first. Anyone with an academic email can sign up, and if your institution is not part of the initial rollout, you will be able to join the waitlist directly. Beyond providing this infrastructure, we see an opportunity to build a global community where engineers and researchers can share, collaborate, and advance the frontier of robot learning together. Supported by our recent $3M pre-seed round led by Earlybird VC, we are excited to take this mission even further. Our long-term goal is for Neuracore to become the natural home for cutting-edge robot learning algorithms and real-world robotics experimentation, helping accelerate the next wave of Physical AI.

Today we are excited to open up Neuracore to the academic community! Neuracore is a new data foundation built to accelerate robot learning by removing one of the field’s biggest bottlenecks: capturing and working with high-fidelity multimodal robotics data. For the first time, researchers can store, view, and work with robotics data in a cloud-native system built specifically for large-scale learning, and we are making this core platform completely free for academia. The platform lets teams capture every sensor at its native rate, store and visualize data without loss, and then train and deploy models locally using our open-source code (Link in the comments). We are rolling out access to select academic institutions first. Anyone with an academic email can sign up, and if your institution is not part of the initial rollout, you will be able to join the waitlist directly. Beyond providing this infrastructure, we see an opportunity to build a global community where engineers and researchers can share, collaborate, and advance the frontier of robot learning together. Supported by our recent $3M pre-seed round led by Earlybird VC, we are excited to take this mission even further. Our long-term goal is for Neuracore to become the natural home for cutting-edge robot learning algorithms and real-world robotics experimentation, helping accelerate the next wave of Physical AI.

Neuracore

40,620 просмотров • 8 месяцев назад

We’re excited to share DiT4DiT, an end-to-end Video-Action Model for robot learning that unifies a video Diffusion Transformer and an action Diffusion Transformer in a single cascaded framework. By leveraging the rich spatiotemporal and physical dynamics learned through video generation, rather than static image-text priors, DiT4DiT achieves state-of-the-art results on LIBERO (98.6%) and RoboCasa GR1 (50.8%) with far less training data, delivering over 10× better sample efficiency and up to 7× faster convergence. Real-world deployment on a humanoid robot further shows robust generalization. We believe this is a step toward making video generation a powerful backbone for robot policy learning. This work builds upon the brilliant foundations laid by Nvidia's GR00T and Cosmos. Project: Paper: Code: Coming soon. In the meantime, you can ask your coding agent to reproduce the method based on GR00T/Cosmos.

We’re excited to share DiT4DiT, an end-to-end Video-Action Model for robot learning that unifies a video Diffusion Transformer and an action Diffusion Transformer in a single cascaded framework. By leveraging the rich spatiotemporal and physical dynamics learned through video generation, rather than static image-text priors, DiT4DiT achieves state-of-the-art results on LIBERO (98.6%) and RoboCasa GR1 (50.8%) with far less training data, delivering over 10× better sample efficiency and up to 7× faster convergence. Real-world deployment on a humanoid robot further shows robust generalization. We believe this is a step toward making video generation a powerful backbone for robot policy learning. This work builds upon the brilliant foundations laid by Nvidia's GR00T and Cosmos. Project: Paper: Code: Coming soon. In the meantime, you can ask your coding agent to reproduce the method based on GR00T/Cosmos.

Shuo Yang

31,596 просмотров • 4 месяцев назад

NVIDIA just announced EgoScale 🤖🧠 NVIDIA Research has uncovered a log-linear scaling law for robot dexterity by pretraining VLA models on over 20,000 hours of egocentric human video This massive dataset is 20 times larger than previous efforts and proves that robot intelligence follows a predictable path: the more human data, the lower the loss The secret is a simple recipe combining large-scale human pretraining with a small amount of aligned human-robot mid-training to bridge the gap In testing, this method boosted the average success rate by 54% on a 22-DoF robotic hand compared to policies built without pretraining EgoScale also enables one-shot task adaptation and works across different hardware, suggesting that human motion is a universal motor prior for robots Website: Paper: Source: NVIDIA Research #Robot #Humanoid #Robotics #AI #EmbodiedAI #PhysicalAI #NVIDIA #EgoScale #GR00T

NVIDIA just announced EgoScale 🤖🧠 NVIDIA Research has uncovered a log-linear scaling law for robot dexterity by pretraining VLA models on over 20,000 hours of egocentric human video This massive dataset is 20 times larger than previous efforts and proves that robot intelligence follows a predictable path: the more human data, the lower the loss The secret is a simple recipe combining large-scale human pretraining with a small amount of aligned human-robot mid-training to bridge the gap In testing, this method boosted the average success rate by 54% on a 22-DoF robotic hand compared to policies built without pretraining EgoScale also enables one-shot task adaptation and works across different hardware, suggesting that human motion is a universal motor prior for robots Website: Paper: Source: NVIDIA Research #Robot #Humanoid #Robotics #AI #EmbodiedAI #PhysicalAI #NVIDIA #EgoScale #GR00T

RoboHub🤖

43,752 просмотров • 4 месяцев назад

I'm observing a mini Moravec's paradox within robotics: gymnastics that are difficult for humans are much easier for robots than "unsexy" tasks like cooking, cleaning, and assembling. It leads to a cognitive dissonance for people outside the field, "so, robots can parkour & breakdance, but why can't they take care of my dog?" Trust me, I got asked by my parents about this more than you think ... The "Robot Moravec's paradox" also creates the illusion that physical AI capabilities are way more advanced than they truly are. I'm not singling out Unitree, as it applies widely to all recent acrobatic demos in the industry. Here's a simple test: if you set up a wall in front of the side-flipping robot, it will slam into it at full force and make a spectacle. Because it's just overfitting that single reference motion, without any awareness of the surroundings. Here's why the paradox exists: it's much easier to train a "blind gymnast" than a robot that sees and manipulates. The former can be solved entirely in simulation and transferred zero-shot to the real world, while the latter demands extremely realistic rendering, contact physics, and messy real-world object dynamics - none of which can be simulated well. Imagine you can train LLMs not from the internet, but from a purely hand-crafted text console game. Roboticists got lucky. We happen to live in a world where accelerated physics engines are so good that we can get away with impressive acrobatics using literally zero real data. But we haven't yet discovered the same cheat code for general dexterity. Till then, we'll still get questioned by our confused parents.

I'm observing a mini Moravec's paradox within robotics: gymnastics that are difficult for humans are much easier for robots than "unsexy" tasks like cooking, cleaning, and assembling. It leads to a cognitive dissonance for people outside the field, "so, robots can parkour & breakdance, but why can't they take care of my dog?" Trust me, I got asked by my parents about this more than you think ... The "Robot Moravec's paradox" also creates the illusion that physical AI capabilities are way more advanced than they truly are. I'm not singling out Unitree, as it applies widely to all recent acrobatic demos in the industry. Here's a simple test: if you set up a wall in front of the side-flipping robot, it will slam into it at full force and make a spectacle. Because it's just overfitting that single reference motion, without any awareness of the surroundings. Here's why the paradox exists: it's much easier to train a "blind gymnast" than a robot that sees and manipulates. The former can be solved entirely in simulation and transferred zero-shot to the real world, while the latter demands extremely realistic rendering, contact physics, and messy real-world object dynamics - none of which can be simulated well. Imagine you can train LLMs not from the internet, but from a purely hand-crafted text console game. Roboticists got lucky. We happen to live in a world where accelerated physics engines are so good that we can get away with impressive acrobatics using literally zero real data. But we haven't yet discovered the same cheat code for general dexterity. Till then, we'll still get questioned by our confused parents.

Jim Fan

398,026 просмотров • 1 год назад

In just one week, Binh Pham and I trained a full-body Unitree G1. Here's a recap: 1. Secured a Unitree G1 humanoid through a LinkedIn post 2. Deployed TWIST2 full-body teleoperation pipelines 3. Adapted TWIST2 for Zed stereo camera & collected full-body teleoperation samples (carried by Binh Pham ) 4. Adapted & fine-tuned NVIDIA Gr00T N1.5 VLA on the TWIST2 public datasets, which I fine-tuned on an 8xNVIDIA H100 Cluster. We picked Gr00T N1.5 as it was trained with Unitree G1 embodiment data. 5. Adapted the TWIST2 codebase to stream in the actions from Gr00T via ZMQ using a co-located NVIDIA H100 for ~200ms inference latency 6. Tested the model in sim, then deployed to the real-world Unitree G1. We streamed a training sample observation to the VLA (as we didn't want to break robot in case real observations were OOD) We were the first team in the world to deploy the full TWIST2 data collection pipeline to the unitree g1 :) Much more work ahead though, which I'll work on as a side-project over the next months: 1. Exploring the various types of 'world models': video backbones, dynamics models, v-jepa-2 models. I believe these will generalize better & train much more data-efficiently than VLM backbones 2. Speeding up inference - I believe low-latency robotics inference will be a big challenge. There are many works in video diffusion which I'd like to test (e.g. SageAttention, SparseAttention, Drifting Models). Perhaps also writing custom CUDA kernels. 3. Economics of inference scaling :) What will be the compute demands as we scale inference up to millions of humanoids? Will it run on edge or on distributed 'co-located' inference clusters? These are questions I'd like to answer. Adapted TWIST2 codebase: Adapted Gr00T-N1.5 codebase: The ETH Robotics Club are doing a cool GTC Golden ticket competition with NVIDIA , so this is my submission :) The DGX Spark compute will get me a long way with initial prototyping & especially working on inference optimization for next-gen Blackwell GPUs #NVIDIAGTC #GOLDENTICKET #ETHRC

In just one week, Binh Pham and I trained a full-body Unitree G1. Here's a recap: 1. Secured a Unitree G1 humanoid through a LinkedIn post 2. Deployed TWIST2 full-body teleoperation pipelines 3. Adapted TWIST2 for Zed stereo camera & collected full-body teleoperation samples (carried by Binh Pham ) 4. Adapted & fine-tuned NVIDIA Gr00T N1.5 VLA on the TWIST2 public datasets, which I fine-tuned on an 8xNVIDIA H100 Cluster. We picked Gr00T N1.5 as it was trained with Unitree G1 embodiment data. 5. Adapted the TWIST2 codebase to stream in the actions from Gr00T via ZMQ using a co-located NVIDIA H100 for ~200ms inference latency 6. Tested the model in sim, then deployed to the real-world Unitree G1. We streamed a training sample observation to the VLA (as we didn't want to break robot in case real observations were OOD) We were the first team in the world to deploy the full TWIST2 data collection pipeline to the unitree g1 :) Much more work ahead though, which I'll work on as a side-project over the next months: 1. Exploring the various types of 'world models': video backbones, dynamics models, v-jepa-2 models. I believe these will generalize better & train much more data-efficiently than VLM backbones 2. Speeding up inference - I believe low-latency robotics inference will be a big challenge. There are many works in video diffusion which I'd like to test (e.g. SageAttention, SparseAttention, Drifting Models). Perhaps also writing custom CUDA kernels. 3. Economics of inference scaling :) What will be the compute demands as we scale inference up to millions of humanoids? Will it run on edge or on distributed 'co-located' inference clusters? These are questions I'd like to answer. Adapted TWIST2 codebase: Adapted Gr00T-N1.5 codebase: The ETH Robotics Club are doing a cool GTC Golden ticket competition with NVIDIA , so this is my submission :) The DGX Spark compute will get me a long way with initial prototyping & especially working on inference optimization for next-gen Blackwell GPUs #NVIDIAGTC #GOLDENTICKET #ETHRC

Arnie Ramesh

14,815 просмотров • 5 месяцев назад

Collecting robot data at scale is key to deploying working manipulation policies, and the team from Tutor Intelligence Tutor is here to tell us about how to accomplish it. Their new announcement: a massive, 100-robot “data factory,” with a behind-the-scenes look at how to build a teleoperation platform and how to make robots and policies that are useful for their customers. Tutor Intelligence is a full-stack robotics company: they build robot arms, they sell robot arms, they write the software and they train neural networks. Josh Gruenstein, Jesse Michel, Shiraz, and Joe McCalmon, and Joe McCalmon join us to tell us more about how they scale both teleop data and human interventions from their teleoperators in order to train the policies they need. Watch Episode #85 of RoboPapers, with Chris Paxton and Jiafei Duan, to learn more!

Collecting robot data at scale is key to deploying working manipulation policies, and the team from Tutor Intelligence Tutor is here to tell us about how to accomplish it. Their new announcement: a massive, 100-robot “data factory,” with a behind-the-scenes look at how to build a teleoperation platform and how to make robots and policies that are useful for their customers. Tutor Intelligence is a full-stack robotics company: they build robot arms, they sell robot arms, they write the software and they train neural networks. Josh Gruenstein, Jesse Michel, Shiraz, and Joe McCalmon, and Joe McCalmon join us to tell us more about how they scale both teleop data and human interventions from their teleoperators in order to train the policies they need. Watch Episode #85 of RoboPapers, with Chris Paxton and Jiafei Duan, to learn more!

RoboPapers

36,041 просмотров • 1 месяц назад

An interactive world model developed by NVIDIA in collaboration with academic partners. - DreamDojo turns egocentric human video data into physical intelligence. - Human data is more scalable than robotics data but lacks action labels. - To solve this, a dedicated action model extracts latent actions by identifying physics and motion deltas between frames. Training - A massive 44k hours of video data are used for pre-training. - Post-training on small-scale robot datasets maps human physics to specific robot embodiments. - An additional distillation stage converts the model into an autoregressive, few-step diffusion model, enabling real-time, action-controllable simulation. Primary Use Cases - Live Teleoperation: Controlling a robot inside a world simulation in real-time. - Model-based Planning: Previewing and curating the best actions for improved success. - Policy Evaluation: Testing robot policies in realistic, out-of-distribution scenarios. Everything that's open-sourced: weights, code, post-training dataset, eval set, and details to reproduce.

An interactive world model developed by NVIDIA in collaboration with academic partners. - DreamDojo turns egocentric human video data into physical intelligence. - Human data is more scalable than robotics data but lacks action labels. - To solve this, a dedicated action model extracts latent actions by identifying physics and motion deltas between frames. Training - A massive 44k hours of video data are used for pre-training. - Post-training on small-scale robot datasets maps human physics to specific robot embodiments. - An additional distillation stage converts the model into an autoregressive, few-step diffusion model, enabling real-time, action-controllable simulation. Primary Use Cases - Live Teleoperation: Controlling a robot inside a world simulation in real-time. - Model-based Planning: Previewing and curating the best actions for improved success. - Policy Evaluation: Testing robot policies in realistic, out-of-distribution scenarios. Everything that's open-sourced: weights, code, post-training dataset, eval set, and details to reproduce.

The Humanoid Hub

11,575 просмотров • 5 месяцев назад

A few months ago we set out to understand exactly how our models can be used to accelerate progress in robotics and other AI systems that don’t just represent the world, but are actually able to interact with it. We’ve been collaborating with leading companies in the space and we’ve developed an initial approach that we're now opening up to accelerate development across the industry. Our fifth announcement is GWM Robotics. GWM Robotics is a learned simulator that generates synthetic data for scalable robot training and policy evaluation, removing the bottlenecks of physical hardware.

A few months ago we set out to understand exactly how our models can be used to accelerate progress in robotics and other AI systems that don’t just represent the world, but are actually able to interact with it. We’ve been collaborating with leading companies in the space and we’ve developed an initial approach that we're now opening up to accelerate development across the industry. Our fifth announcement is GWM Robotics. GWM Robotics is a learned simulator that generates synthetic data for scalable robot training and policy evaluation, removing the bottlenecks of physical hardware.

Runway

35,040 просмотров • 7 месяцев назад

At Axis, every trajectory submitted by our community undergoes a strict replay validation process. We run each submission through checker to verify whether the target task was successfully completed. To see how strict it is, check this demo (Task: Place The Toy Train On The Board Game Box). Real human data passes smoothly (Video 1). However, bots or manually altered data will fail (Video 2). Why? Faking numbers breaks the simulation's physics causality. Even tiny tweaks cause error accumulation, resulting in failed movements. This invalid data is automatically rejected. Because of this mechanism, data submitted via bots will ultimately fail our replay verification. Invalid data is strictly excluded from model training, and the task slot is reopened to the community to collect genuine, high-quality trajectories. Furthermore, we actively monitor for duplicated data. Trajectories that are identical lack the diversity required for robot learning and will not be credited by our scoring system. If we detect accounts submitting a massive volume of identical trajectories, all associated addresses will be permanently banned. For Axis, the quality and diversity of data are the only ways to solve the robotics generalization gap. They will always be our absolute top priorities.

At Axis, every trajectory submitted by our community undergoes a strict replay validation process. We run each submission through checker to verify whether the target task was successfully completed. To see how strict it is, check this demo (Task: Place The Toy Train On The Board Game Box). Real human data passes smoothly (Video 1). However, bots or manually altered data will fail (Video 2). Why? Faking numbers breaks the simulation's physics causality. Even tiny tweaks cause error accumulation, resulting in failed movements. This invalid data is automatically rejected. Because of this mechanism, data submitted via bots will ultimately fail our replay verification. Invalid data is strictly excluded from model training, and the task slot is reopened to the community to collect genuine, high-quality trajectories. Furthermore, we actively monitor for duplicated data. Trajectories that are identical lack the diversity required for robot learning and will not be credited by our scoring system. If we detect accounts submitting a massive volume of identical trajectories, all associated addresses will be permanently banned. For Axis, the quality and diversity of data are the only ways to solve the robotics generalization gap. They will always be our absolute top priorities.

Axis Robotics

30,772 просмотров • 3 месяцев назад

China-based TARS Robotics demonstrated a humanoid robot performing two-handed hand embroidery during a public showcase on December 22, marking a notable step in fine motor control for humanoid systems. The robot threaded a needle and stitched a logo using both hands with sub-millimeter accuracy, working with soft, flexible materials that are difficult for traditional industrial robots to handle due to deformation and variability. According to the company, this capability is enabled by a closed-loop "Data-Al-Physics" system that connects real-world data collection, embodied Al models, and physical execution, reducing the gap between simulation and real deployment. The models are said to be open source when released. Founded in February 2025, TARS Robotics has moved quickly from research to live demonstrations and has raised over $240 million in early funding from the Chinese government, reflecting growing interest in general-purpose humanoid robots for precision and dexterous tasks.

Brian Roemmele

82,611 просмотров • 7 месяцев назад

We are back again :) After three weeks of quiet building. Introducing Genesis World 1.0, our latest simulation platform, the second release in our full-stack suite. Open-sourced. Robotics is still bottlenecked by the 1× speed of the physical world. Every model, checkpoint, and data recipe eventually needs to be tested on physical hardware, slowly, expensively, and with limited coverage. One hour in reality can become 100 days in simulation. That is how robotics model iteration moves from a wall-clock bottleneck to a compute problem. To make this work, simulation has to be both fast and trustworthy. Over the past year, we rebuilt the entire stack: a GPU-accelerated cross-platform compiler, penetration-free multi-physics contact solvers, unified rigid and deformable physics, and a photo-realistic renderer purpose-built for physical AI applications. We built Nyx, a high-performance path-traced rendering engine for robotics application. Genesis World 1.0 achieves near realtime performance with our latest development for penetration-free IPC solver, supporting various types of deformables beyond rigid bodies. It supports contact-rich, dexterous manipulation simulation across different embodiments: unitree, sharpa, wuji, genesis hand and various types of grippers. Under the hood is Quadrants, our effort in pushing forward cross-platform GPU-accelerated computation. Quadrants started as a fork of Taichi, and we rebuilt most of the critical parts for optimizing simulation workloads, giving 10x faster launch time and up to 4.6x runtime performance compared to the initial Genesis release. Together, they bring us to an unprecedentedly low sim-to-real gap, enabling zero-shot real-to-sim model evaluation and much faster iteration of GENE. All available today. Genesis World 1.0: Quadrants: Nyx:

We are back again :) After three weeks of quiet building. Introducing Genesis World 1.0, our latest simulation platform, the second release in our full-stack suite. Open-sourced. Robotics is still bottlenecked by the 1× speed of the physical world. Every model, checkpoint, and data recipe eventually needs to be tested on physical hardware, slowly, expensively, and with limited coverage. One hour in reality can become 100 days in simulation. That is how robotics model iteration moves from a wall-clock bottleneck to a compute problem. To make this work, simulation has to be both fast and trustworthy. Over the past year, we rebuilt the entire stack: a GPU-accelerated cross-platform compiler, penetration-free multi-physics contact solvers, unified rigid and deformable physics, and a photo-realistic renderer purpose-built for physical AI applications. We built Nyx, a high-performance path-traced rendering engine for robotics application. Genesis World 1.0 achieves near realtime performance with our latest development for penetration-free IPC solver, supporting various types of deformables beyond rigid bodies. It supports contact-rich, dexterous manipulation simulation across different embodiments: unitree, sharpa, wuji, genesis hand and various types of grippers. Under the hood is Quadrants, our effort in pushing forward cross-platform GPU-accelerated computation. Quadrants started as a fork of Taichi, and we rebuilt most of the critical parts for optimizing simulation workloads, giving 10x faster launch time and up to 4.6x runtime performance compared to the initial Genesis release. Together, they bring us to an unprecedentedly low sim-to-real gap, enabling zero-shot real-to-sim model evaluation and much faster iteration of GENE. All available today. Genesis World 1.0: Quadrants: Nyx:

Genesis AI

309,186 просмотров • 2 месяцев назад