Loading video...

Video Failed to Load

There was a problem loading this video. This could be due to a temporary network issue or the video might be unavailable.

🎉 Diffusion-style annealing + sampling-based MPC can surpass RL, and seamlessly adapt to task parameters, all 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴-𝗳𝗿𝗲𝗲! We open sourced DIAL-MPC, the first training-free method for whole-body torque control using full-order dynamics 🧵

Haoru Xue

2,942 subscribers

172,518 views • 1 year ago •via X (Twitter)

Gaming Science & Technology Education

Anya Rossi• Live Now

Private livecam show

11 Comments

Haoru Xue1 year ago

1/9 We surprisingly found that the formulation of sampling-based MPC and single diffusion step are 𝗲𝗾𝘂𝗶𝘃𝗮𝗹𝗲𝗻𝘁. This motivates us to do multi-step diffusion in MPC. (theoretical proofs in paper) 𝗥𝗲𝗮𝗹-𝘁𝗶𝗺𝗲 sampling is possible! We did it thanks to recent advancement in massively parallel simulation.

Haoru Xue1 year ago

2/9 There are two diffusion-style annealings in DIAL-MPC: Trajectory-level: within a single timestep, we iteratively re-sample with adaptive distribution, which is equivalent to denoising in diffusion.

Haoru Xue1 year ago

3/9 And action-level: we leverage the receeding-horizon nature of MPC to re-use partially diffused actions in future steps.

Haoru Xue1 year ago

4/9 Some DIAL-MPC tasks can be deployed directly to real. We achieve 𝗿𝗲𝗮𝗹-𝘁𝗶𝗺𝗲 𝘁𝗼𝗿𝗾𝘂𝗲 𝗰𝗼𝗻𝘁𝗿𝗼𝗹 on a quadruped doing versatile tasks. We can even give it a very heavy payload (6 kg 👇), and it could easily adapt to it after modifying the mass in the model.

Haoru Xue1 year ago

5/9 Why is DIAL-MPC better than conventional sampling-based MPC? In our paper and website, we includes a toy experiment with a very rough cost function landscape to explain the theory.

Haoru Xue1 year ago

6/9 Although not an 🍎 to 📷 comparison, why can DIAL-MPC be better than RL policy? 1. It is training-free: all reward designs can be instantly verified. 2. It is explicitly adaptive: facing OOD tasks like📷

Haoru Xue1 year ago

7/9 We emphasize that DIAL-MPC is not meant to compete with RL - its future directions align very well with RL: 1. Add nominal value and policy functions to shorten DIAL-MPC horizon needs. 2. It can be an "RL reward visualizer" to accelerate RL reward engineering.

Haoru Xue1 year ago

8/9 DIAL-MPC is parallel to popular works in using diffusion-style annealing for robot learning. A significant distinction is that DIAL-MPC is model-based whereas these are model-free: - Diffusion Policy Policy Optimization - Streaming Diffusion Policy

Haoru Xue1 year ago

9/9 Thanks for reading through! Check out our project website! We also open-source the code on GitHub to run all DIAL-MPC demos, including the rendering pipeline. Follow the authors @HaoruXue @ChaoyiPan @zejiyi @guannanqu @GuanyaShi for more updates!

Chen Tessler1 year ago

Looks amazing! Congrats! And cudos on the visuals too!

Haoru Xue1 year ago

thank you! we also open-sourced our blender pipeline in the codebase, courtesy of the UMI on Legs authors

Related Videos

Thanks AK! Finally, robot can do continuous, agile, autonomous, adaptive jumping over stair and stepping stone Key idea: combine the pros of model-free RL and model-based control. RL (for CoM refs) + QP (for GRF) + WBC (for torque) Open-sourced:

Thanks AK! Finally, robot can do continuous, agile, autonomous, adaptive jumping over stair and stepping stone Key idea: combine the pros of model-free RL and model-based control. RL (for CoM refs) + QP (for GRF) + WBC (for torque) Open-sourced:

Guanya Shi

32,155 views • 1 year ago

Check out our #ICRA2024 paper "Actor-Critic Model Predictive Control." Model-free #reinforcementlearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) benefits from robustness and online replanning capabilities. We combine both approaches by introducing a new framework called Actor-Critic Model Predictive Control. The key idea is to embed a differentiable MPC within an Actor-Critic RL framework. The proposed approach leverages the short-term predictive optimization capabilities of MPC with the exploratory and end-to-end training properties of RL. The resulting policy effectively manages both short-term decisions through the MPC-based actor and long-term prediction via the critic network, unifying the benefits of both model-based control and end-to-end learning. We validate our method in simulation and the real world with a quadcopter across various high-level tasks. We show that the proposed architecture can achieve real-time control performance, learn complex behaviors via trial and error, and retain the predictive properties of the MPC to better handle out-of-distribution behavior. Paper: Full Video with more details: Kudos to Ángel Romero, Yunlong Song IEEE ICRA University of Zurich UZH Science UZH Space Hub Aerial Core AUTOASSESS European Research Council (ERC)

Check out our #ICRA2024 paper "Actor-Critic Model Predictive Control." Model-free #reinforcementlearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) benefits from robustness and online replanning capabilities. We combine both approaches by introducing a new framework called Actor-Critic Model Predictive Control. The key idea is to embed a differentiable MPC within an Actor-Critic RL framework. The proposed approach leverages the short-term predictive optimization capabilities of MPC with the exploratory and end-to-end training properties of RL. The resulting policy effectively manages both short-term decisions through the MPC-based actor and long-term prediction via the critic network, unifying the benefits of both model-based control and end-to-end learning. We validate our method in simulation and the real world with a quadcopter across various high-level tasks. We show that the proposed architecture can achieve real-time control performance, learn complex behaviors via trial and error, and retain the predictive properties of the MPC to better handle out-of-distribution behavior. Paper: Full Video with more details: Kudos to Ángel Romero, Yunlong Song IEEE ICRA University of Zurich UZH Science UZH Space Hub Aerial Core AUTOASSESS European Research Council (ERC)

Davide Scaramuzza

34,889 views • 2 years ago

Check out our latest work, "Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight," published in the IEEE Transactions on Robotics, where we reconcile #OptimalControl and #ReinforcementLearning, achieving the same super-human performance, but with superior generalizability, as our previous model-free deep RL! Code released! PDF: Code: Full Video: Model-free #ReinforcementLearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) provides robustness, constraint handling, and powerful online replanning capabilities. In this work, we extend our previous AC-MPC paper (Romero, ICRA'24) by taking a deeper look at how both approaches can be unified. We introduce and extend Actor-Critic Model Predictive Control (AC-MPC), a framework that embeds a differentiable MPC inside an Actor-Critic RL architecture. This integration allows the MPC-based actor to perform short-term predictive optimization, while the critic facilitates long-horizon learning and exploration. We conduct a comprehensive study that highlights AC-MPC’s key advantages: - Better out-of-distribution generalization, both against unknown disturbances and changes in the quadrotor dynamics - Improved sample efficiency - A novel empirical analysis uncovering a relationship between the critic’s value function and the MPC cost function, providing deeper insight into their interplay. We validate our method in simulation and the real world on a quadcopter flying at superhuman speeds of up to 21 m/s, matching state-of-the-art model-free RL performance, and retaining the predictive structure of MPC for more reliable out-of-distribution behavior. Reference: Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight IEEE Transactions on Robotics (T-RO), 2025 PDF: Full Video: Code: Kudos to Ángel Romero, Elie Aljalbout, Yunlong Song! University of Zurich UZH Science UZH Space Hub AUTOASSESS European Research Council (ERC) UZHai

Check out our latest work, "Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight," published in the IEEE Transactions on Robotics, where we reconcile #OptimalControl and #ReinforcementLearning, achieving the same super-human performance, but with superior generalizability, as our previous model-free deep RL! Code released! PDF: Code: Full Video: Model-free #ReinforcementLearning (RL) is known for its strong task performance and flexibility in optimizing general reward formulations. On the other hand, #ModelPredictiveControl (MPC) provides robustness, constraint handling, and powerful online replanning capabilities. In this work, we extend our previous AC-MPC paper (Romero, ICRA'24) by taking a deeper look at how both approaches can be unified. We introduce and extend Actor-Critic Model Predictive Control (AC-MPC), a framework that embeds a differentiable MPC inside an Actor-Critic RL architecture. This integration allows the MPC-based actor to perform short-term predictive optimization, while the critic facilitates long-horizon learning and exploration. We conduct a comprehensive study that highlights AC-MPC’s key advantages: - Better out-of-distribution generalization, both against unknown disturbances and changes in the quadrotor dynamics - Improved sample efficiency - A novel empirical analysis uncovering a relationship between the critic’s value function and the MPC cost function, providing deeper insight into their interplay. We validate our method in simulation and the real world on a quadcopter flying at superhuman speeds of up to 21 m/s, matching state-of-the-art model-free RL performance, and retaining the predictive structure of MPC for more reliable out-of-distribution behavior. Reference: Actor-Critic Model Predictive Control: Differentiable Optimization meets Reinforcement Learning for Agile Flight IEEE Transactions on Robotics (T-RO), 2025 PDF: Full Video: Code: Kudos to Ángel Romero, Elie Aljalbout, Yunlong Song! University of Zurich UZH Science UZH Space Hub AUTOASSESS European Research Council (ERC) UZHai

Davide Scaramuzza

27,090 views • 6 months ago

Remarkably lifelike motion and fluidity. BeyondMimic is a framework for training humanoid whole-body control from large mocap datasets. First, an open-source motion-tracking pipeline to reproduce diverse, highly dynamic human skills on real hardware, then distilling them into a guided state-action diffusion model for zero-shot, task-specific control. Project page:

Remarkably lifelike motion and fluidity. BeyondMimic is a framework for training humanoid whole-body control from large mocap datasets. First, an open-source motion-tracking pipeline to reproduce diverse, highly dynamic human skills on real hardware, then distilling them into a guided state-action diffusion model for zero-shot, task-specific control. Project page:

The Humanoid Hub

58,081 views • 11 months ago

So you’ve trained your favorite diffusion/flow based policy, but it’s just not good enough 0-shot. Worry not, in our new work DSRL - we show how to *steer* pre-trained diffusion policies with off-policy RL, improving behavior efficiently enough for direct training in the real world! DSRL retains nice exploration from the base policy, but allows for quick improvement beyond this base policy with RL. The method is frustratingly simple, and super easy to throw on top of your favorite pretrained policy (VLA/diffusion policy, etc). Let’s think about how it works, 🧵 (1/10)

So you’ve trained your favorite diffusion/flow based policy, but it’s just not good enough 0-shot. Worry not, in our new work DSRL - we show how to steer pre-trained diffusion policies with off-policy RL, improving behavior efficiently enough for direct training in the real world! DSRL retains nice exploration from the base policy, but allows for quick improvement beyond this base policy with RL. The method is frustratingly simple, and super easy to throw on top of your favorite pretrained policy (VLA/diffusion policy, etc). Let’s think about how it works, 🧵 (1/10)

Abhishek Gupta

19,084 views • 1 year ago

Presto! Distilling Steps and Layers for Accelerating Music Generation Despite advances in diffusion-based text-to-music (TTM) methods, efficient, high-quality generation remains a challenge. We introduce Presto!, an approach to inference acceleration for score-based diffusion transformers via reducing both sampling steps and cost per step. To reduce steps, we develop a new score-based distribution matching distillation (DMD) method for the EDM-family of diffusion models, the first GAN-based distillation method for TTM. To reduce the cost per step, we develop a simple, but powerful improvement to a recent layer distillation method that improves learning via better preserving hidden state variance. Finally, we combine our step and layer distillation methods together for a dual-faceted approach. We evaluate our step and layer distillation methods independently and show each yield best-in-class performance. Our combined distillation method can generate high-quality outputs with improved diversity, accelerating our base model by 10-18x (230/435ms latency for 32 second mono/stereo 44.1kHz, 15x faster than comparable SOTA) -- the fastest high-quality TTM to our knowledge.

Presto! Distilling Steps and Layers for Accelerating Music Generation Despite advances in diffusion-based text-to-music (TTM) methods, efficient, high-quality generation remains a challenge. We introduce Presto!, an approach to inference acceleration for score-based diffusion transformers via reducing both sampling steps and cost per step. To reduce steps, we develop a new score-based distribution matching distillation (DMD) method for the EDM-family of diffusion models, the first GAN-based distillation method for TTM. To reduce the cost per step, we develop a simple, but powerful improvement to a recent layer distillation method that improves learning via better preserving hidden state variance. Finally, we combine our step and layer distillation methods together for a dual-faceted approach. We evaluate our step and layer distillation methods independently and show each yield best-in-class performance. Our combined distillation method can generate high-quality outputs with improved diversity, accelerating our base model by 10-18x (230/435ms latency for 32 second mono/stereo 44.1kHz, 15x faster than comparable SOTA) -- the fastest high-quality TTM to our knowledge.

AK

30,430 views • 1 year ago

Glad that our work “Inference-Time Enhancement of Generative Robot Policies via Predictive World Modeling”, led by Han Qi, has been accepted to IEEE Robotics and Automation Letters! 🎉 We propose Generative Predictive Control (GPC): sample action proposals from a pretrained diffusion policy (“look back”), roll them out with a diffusion-based action-conditioned video world model (“look forward”), then rank or optimize the actions using either a learned reward model or VLM preferences. Conceptually, this is trajectory optimization / MPC with hybrid sampling + gradient optimization, interpreted through modern diffusion priors and video world models. Interestingly, we first posted the paper on arXiv in Feb 2025, when action-conditioned video world models for planning were still rare—now this direction is rapidly gaining traction. Still many open questions, e.g., • how to avoid local minima in planning • what representations work best for world models • how to balance physics priors vs. data-driven learning Paper:

Glad that our work “Inference-Time Enhancement of Generative Robot Policies via Predictive World Modeling”, led by Han Qi, has been accepted to IEEE Robotics and Automation Letters! 🎉 We propose Generative Predictive Control (GPC): sample action proposals from a pretrained diffusion policy (“look back”), roll them out with a diffusion-based action-conditioned video world model (“look forward”), then rank or optimize the actions using either a learned reward model or VLM preferences. Conceptually, this is trajectory optimization / MPC with hybrid sampling + gradient optimization, interpreted through modern diffusion priors and video world models. Interestingly, we first posted the paper on arXiv in Feb 2025, when action-conditioned video world models for planning were still rare—now this direction is rapidly gaining traction. Still many open questions, e.g., • how to avoid local minima in planning • what representations work best for world models • how to balance physics priors vs. data-driven learning Paper:

Heng Yang

18,994 views • 4 months ago

StyleDrop: Text-to-Image Generation in Any Style introduce StyleDrop, a method that enables the synthesis of images that faithfully follow a specific style using a text-to-image model. The proposed method is extremely versatile and captures nuances and details of a user-provided style, such as color schemes, shading, design patterns, and local and global effects. It efficiently learns a new style by fine-tuning very few trainable parameters (less than 1% of total model parameters) and improving the quality via iterative training with either human or automated feedback. Better yet, StyleDrop is able to deliver impressive results even when the user supplies only a single image that specifies the desired style. An extensive study shows that, for the task of style tuning text-to-image models, StyleDrop implemented on Muse convincingly outperforms other methods, including DreamBooth and textual inversion on Imagen or Stable Diffusion. paper page:

StyleDrop: Text-to-Image Generation in Any Style introduce StyleDrop, a method that enables the synthesis of images that faithfully follow a specific style using a text-to-image model. The proposed method is extremely versatile and captures nuances and details of a user-provided style, such as color schemes, shading, design patterns, and local and global effects. It efficiently learns a new style by fine-tuning very few trainable parameters (less than 1% of total model parameters) and improving the quality via iterative training with either human or automated feedback. Better yet, StyleDrop is able to deliver impressive results even when the user supplies only a single image that specifies the desired style. An extensive study shows that, for the task of style tuning text-to-image models, StyleDrop implemented on Muse convincingly outperforms other methods, including DreamBooth and textual inversion on Imagen or Stable Diffusion. paper page:

AK

56,377 views • 3 years ago

🤖 Introducing H2O (Human2HumanOid): - 🧠 An RL-based human-to-humanoid real-time whole-body teleoperation framework - 💃 Scalable retargeting and training using large human motion dataset - 🎥 With just an RGB camera, everyone can teleoperate a full-sized humanoid to perform actions like pick and place, walking, kicking, boxing, etc - 💡Unleash the potential of humanoids with human cognitive skills and adaptability 🔗:

🤖 Introducing H2O (Human2HumanOid): - 🧠 An RL-based human-to-humanoid real-time whole-body teleoperation framework - 💃 Scalable retargeting and training using large human motion dataset - 🎥 With just an RGB camera, everyone can teleoperate a full-sized humanoid to perform actions like pick and place, walking, kicking, boxing, etc - 💡Unleash the potential of humanoids with human cognitive skills and adaptability 🔗:

Tairan He

96,738 views • 2 years ago

Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery TL;DR: Skyfall-GS converts satellite images to explorable 3D urban scenes using diffusion models, with real-time rendering performance. Contributions: • We introduce Skyfall-GS, the first method to synthesize immersive, real-time, free-flight navigable 3D urban scenes solely from multi-view satellite imagery using generative refinement. • An open-domain refinement approach leverages pre-trained text-to-image diffusion models without domain-specific training. • A curriculum-learning-based iterative refinement strategy progressively enhances reconstruction quality from higher to lower viewpoints, significantly improving visual fidelity in occluded areas.

Skyfall-GS: Synthesizing Immersive 3D Urban Scenes from Satellite Imagery TL;DR: Skyfall-GS converts satellite images to explorable 3D urban scenes using diffusion models, with real-time rendering performance. Contributions: • We introduce Skyfall-GS, the first method to synthesize immersive, real-time, free-flight navigable 3D urban scenes solely from multi-view satellite imagery using generative refinement. • An open-domain refinement approach leverages pre-trained text-to-image diffusion models without domain-specific training. • A curriculum-learning-based iterative refinement strategy progressively enhances reconstruction quality from higher to lower viewpoints, significantly improving visual fidelity in occluded areas.

MrNeRF

66,111 views • 9 months ago

X-Humanoid just dropped TG-VLA — the first full-size, whole-body VLA framework for humanoids. 🤖 Most VLA demos today still look arm-centric: see, plan, reach, grasp. The legs are mostly there to carry the arms. TG-VLA pushes action into the whole body: → HEX: task context & cross-embodiment learning → HAF-VLA: high-DoF motion → structured action flows → DSRL-DCT: online RL through a compressed latent space The real shift isn't a smarter hand — it's locomotion, torso control, balance, and manipulation inside one control loop. That's the move: from mobile dual-arm machines to full-body agents.

X-Humanoid just dropped TG-VLA — the first full-size, whole-body VLA framework for humanoids. 🤖 Most VLA demos today still look arm-centric: see, plan, reach, grasp. The legs are mostly there to carry the arms. TG-VLA pushes action into the whole body: → HEX: task context & cross-embodiment learning → HAF-VLA: high-DoF motion → structured action flows → DSRL-DCT: online RL through a compressed latent space The real shift isn't a smarter hand — it's locomotion, torso control, balance, and manipulation inside one control loop. That's the move: from mobile dual-arm machines to full-body agents.

RoboHub🤖

41,598 views • 22 days ago

The power of generative models — now embodied in humanoids. Announcing DreamControl –– After a year-long research effort at General Robotics — we present a scalable framework for whole-body humanoid control that fuses diffusion priors with reinforcement learning to unlock real-world scene interaction. Diffusion + RL → natural whole-body skills on real robots. DreamControl enables humanoids to move beyond locomotion demos → performing natural, human-like skills such as –– Picking & lifting objects, Opening drawers & doors, Precise punching, kicking, and jumping, Bimanual manipulation tasks Our key innovation: a diffusion prior over human motion that guides RL, eliminating the need for massive teleoperation datasets, and producing motions that look human while transferring to real hardware. Trained purely in simulation, deployed on the Unitree G1 humanoid, DreamControl policies run in real time, bridging sim-to-real with unprecedented naturalness. We leverage a novel hybrid edge + cloud infrastructure that runs RL-trained policies on the edge backed by powerful AI models running in the cloud This is the next step in General Robotics’ journey toward general-purpose humanoid assistants that interact, adapt, and assist autonomously. Paper: Blog: 1/n

The power of generative models — now embodied in humanoids. Announcing DreamControl –– After a year-long research effort at General Robotics — we present a scalable framework for whole-body humanoid control that fuses diffusion priors with reinforcement learning to unlock real-world scene interaction. Diffusion + RL → natural whole-body skills on real robots. DreamControl enables humanoids to move beyond locomotion demos → performing natural, human-like skills such as –– Picking & lifting objects, Opening drawers & doors, Precise punching, kicking, and jumping, Bimanual manipulation tasks Our key innovation: a diffusion prior over human motion that guides RL, eliminating the need for massive teleoperation datasets, and producing motions that look human while transferring to real hardware. Trained purely in simulation, deployed on the Unitree G1 humanoid, DreamControl policies run in real time, bridging sim-to-real with unprecedented naturalness. We leverage a novel hybrid edge + cloud infrastructure that runs RL-trained policies on the edge backed by powerful AI models running in the cloud This is the next step in General Robotics’ journey toward general-purpose humanoid assistants that interact, adapt, and assist autonomously. Paper: Blog: 1/n

Ashish Kapoor

118,133 views • 10 months ago

✨We are excited to open-source Tencent HY-Motion 1.0, a billion-parameter text-to-motion model built on the Diffusion Transformer (DiT) architecture and flow matching. Tencent HY-Motion 1.0 empowers developers and individual creators alike by transforming natural language into high-fidelity, fluid, and diverse 3D character animations, delivering exceptional instruction-following capabilities across a broad range of categories. The generated 3D animation assets can be seamlessly integrated into typical 3D animation pipelines.🎮🎥 Highlights: 🔹Billion-Scale DiT: Successfully scaled flow-matching DiT to 1B+ parameters, setting a new ceiling for instruction-following capability and generated motion quality. 🔹Full-Stage Training Strategy: The industry’s first motion generation model featuring a complete Pre-training → SFT → RL loop to optimize physical plausibility and semantic accuracy. 🔹Comprehensive Category Coverage: Features 200+ motion categories across 6 major classes—the most comprehensive in the industry, curated via a meticulous data pipeline. 🌐Project Page: 🔗Github: 🤗Hugging Face: 📄Technical report:

✨We are excited to open-source Tencent HY-Motion 1.0, a billion-parameter text-to-motion model built on the Diffusion Transformer (DiT) architecture and flow matching. Tencent HY-Motion 1.0 empowers developers and individual creators alike by transforming natural language into high-fidelity, fluid, and diverse 3D character animations, delivering exceptional instruction-following capabilities across a broad range of categories. The generated 3D animation assets can be seamlessly integrated into typical 3D animation pipelines.🎮🎥 Highlights: 🔹Billion-Scale DiT: Successfully scaled flow-matching DiT to 1B+ parameters, setting a new ceiling for instruction-following capability and generated motion quality. 🔹Full-Stage Training Strategy: The industry’s first motion generation model featuring a complete Pre-training → SFT → RL loop to optimize physical plausibility and semantic accuracy. 🔹Comprehensive Category Coverage: Features 200+ motion categories across 6 major classes—the most comprehensive in the industry, curated via a meticulous data pipeline. 🌐Project Page: 🔗Github: 🤗Hugging Face: 📄Technical report:

Tencent Hy

328,493 views • 6 months ago

🤔 Ever wondered if simulation-based animation/avatar learnings can be applied to real humanoid in real-time? 🤖 Introducing H2O (Human2HumanOid): - 🧠 An RL-based human-to-humanoid real-time whole-body teleoperation framework - 💃 Scalable retargeting and training using large human motion dataset - 🎥 With just an RGB camera, everyone can teleoperate a full-sized humanoid to perform actions like pick and place, walking, kicking, boxing, etc - 💡Unleash the potential of humanoids with human cognitive skills and adaptability 🔗: 📄: 🎬: H2O proposes: - A scalable retargeting framework for obtaining large-scale humanoid motion dataset, intelligently filtering out infeasible motion for the humanoid embodiment. - We train a full-body motion imitator (similar to PHC) and deploy to the real world zero-shot. - Using this framework, we enable real-time teleoperation of a humanoid via a human operator and webcam, performing skills such as pick and place, kicking, walking strollers, etc. Team: Tairan He Zhengyi “Zen” Luo Wenli Xiao @ChongZitaZhang Kris Kitani Changliu Liu Guanya Shi

Zhengyi “Zen” Luo

47,305 views • 2 years ago

🤖🤖🤖 Following RoboVerse, we introduce another work focused on Robotic Tactile Simulation - Taccel Simulator. Taccel is a high-performance simulation platform for vision-based tactile sensors and robots. 🚀🚀🚀 Boosted by Nvidia Warp, we optimize Taccel with highly parallelized simulations and support 900fps simulation with 4k+ parallel training envs. 🤝🤝🤝 Taccel is designed with user-friendly APIs and is easy to use. We open-sourced all the code and documentation. Feel free to try! Project: Preprint: Code:

🤖🤖🤖 Following RoboVerse, we introduce another work focused on Robotic Tactile Simulation - Taccel Simulator. Taccel is a high-performance simulation platform for vision-based tactile sensors and robots. 🚀🚀🚀 Boosted by Nvidia Warp, we optimize Taccel with highly parallelized simulations and support 900fps simulation with 4k+ parallel training envs. 🤝🤝🤝 Taccel is designed with user-friendly APIs and is easy to use. We open-sourced all the code and documentation. Feel free to try! Project: Preprint: Code:

Siyuan Huang

10,668 views • 1 year ago

Relightable Full-Body Gaussian Codec Avatars TL;DR: First drivable full-body avatar model that reconstructs perceptually realistic relightable appearance. Contributions: • We propose the first relightable full-body avatar model that jointly models the relightable appearance of the human body, face, and hands for high-fidelity relighting and animation. • To handle full-body articulations with global light transport, we propose learnable zonal harmonics to represent local diffuse radiance transfer in the local coordinate frames of each Gaussian. This results in a reduced number of parameters and improved rendering quality compared to the commonly used spherical harmonics representation. • We reformulate the learnable radiance transfer to explicitly decompose non-local shadowing and propose a dedicated shadow network to predict shadows caused by the articulation of the body. Additionally, we propose a physically based irradiance normalization scheme to ensure that the shadow network can generalize to novel illumination conditions, such as unseen environment maps. • We show that deferred shading can be used for our learned specular radiance transfer function, achieving high-fidelity specular reflections for relightable human avatar modeling without excessively increasing the number of Gaussians.

MrNeRF

10,966 views • 1 year ago

How can we build a general-purpose “foundation model” for robot motion? Zhengyi “Zen” Luo joins us to talk about SONIC, which uses motion tracking as a foundational task for humanoid robot control, and scales humanoid control training to 9k GPU hours and 100 million frames worth of data. The result: a model with a generally-useful embedding space that can be controlled by a VLA, or from human video, to perform a wide variety of humanoid whole-body-control tasks, including with zero-shot transfer to previously unseen motions. Watch episode 72 of RoboPapers, with Michael Cho - Rbt/Acc and Jiafei Duan, now!

How can we build a general-purpose “foundation model” for robot motion? Zhengyi “Zen” Luo joins us to talk about SONIC, which uses motion tracking as a foundational task for humanoid robot control, and scales humanoid control training to 9k GPU hours and 100 million frames worth of data. The result: a model with a generally-useful embedding space that can be controlled by a VLA, or from human video, to perform a wide variety of humanoid whole-body-control tasks, including with zero-shot transfer to previously unseen motions. Watch episode 72 of RoboPapers, with Michael Cho - Rbt/Acc and Jiafei Duan, now!

RoboPapers

24,802 views • 3 months ago

The first human age-reversal trial is officially happening. But before the FDA cleared it, Harvard professor David Sinclair had to pull off a mice experiment most scientists thought was impossible: "These mice had their optic nerve regenerated. We were able to show that using [the information theory of aging] method we could cure blindness in animal for the first time." Since then, he also discovered you could treat and reverse diseases like Alzheimer's, multiple sclerosis, ALS, kidney disease and liver disease in mice too: “It's not just the eye that can get reversed and cured of diseases. It's seemingly every part of the body.” It's what he calls "a universal reset of the body." He confirmed his method also worked in monkeys. Now humans are next. The FDA just cleared the first age-reversal trial. Life Biosciences raised $80 million to make it happen. As he put it: "The eye is just the beginning. We believe we can treat every tissue—a whole body reset."

The first human age-reversal trial is officially happening. But before the FDA cleared it, Harvard professor David Sinclair had to pull off a mice experiment most scientists thought was impossible: "These mice had their optic nerve regenerated. We were able to show that using [the information theory of aging] method we could cure blindness in animal for the first time." Since then, he also discovered you could treat and reverse diseases like Alzheimer's, multiple sclerosis, ALS, kidney disease and liver disease in mice too: “It's not just the eye that can get reversed and cured of diseases. It's seemingly every part of the body.” It's what he calls "a universal reset of the body." He confirmed his method also worked in monkeys. Now humans are next. The FDA just cleared the first age-reversal trial. Life Biosciences raised $80 million to make it happen. As he put it: "The eye is just the beginning. We believe we can treat every tissue—a whole body reset."

John Cumbers

1,159,303 views • 4 months ago

Tencent presents GameGen-O Open-world Video Game Generation We introduce GameGen-O, the first diffusion transformer model tailored for the generation of open-world video games. This model facilitates high-quality, open-domain generation by simulating a wide array of game engine features, such as innovative characters, dynamic environments, complex actions, and diverse events. Additionally, it provides interactive controllability, thus allowing for the gameplay simulation. The development of GameGen-O involves a comprehensive data collection and processing effort from scratch. We collect and build the first Open-World Video Game Dataset (OGameData), amassed extensive data from over a hundred of next-generation open-world games, employing a proprietary data pipeline for efficient sorting, scoring, filtering, and decoupled captioning. This robust and extensive OGameData forms the foundation of our model's training process. GameGen-O undergoes a two-stage training process, consisting of foundation model pretraining and instruction tuning. In the first phase, the model is pre-trained on the OGameData via the text-to-video and video continuation, endowing GameGen-O with the capability for open-domain video game generation. In the second phase, the pre-trained model is frozen, and we fine-tuned using a trainable InstructNet, which enables the production of subsequent frames based on multimodal structural instructions. This whole training process imparts the model with the ability to generate and interactively control content. In summary, GameGen-O represents a notable initial step forward in the realm of open-world video game generation via generative models. It underscores the potential of generative models to serve as an alternative to rendering techniques, which can efficiently combine creative generation with interactive capabilities.

Tencent presents GameGen-O Open-world Video Game Generation We introduce GameGen-O, the first diffusion transformer model tailored for the generation of open-world video games. This model facilitates high-quality, open-domain generation by simulating a wide array of game engine features, such as innovative characters, dynamic environments, complex actions, and diverse events. Additionally, it provides interactive controllability, thus allowing for the gameplay simulation. The development of GameGen-O involves a comprehensive data collection and processing effort from scratch. We collect and build the first Open-World Video Game Dataset (OGameData), amassed extensive data from over a hundred of next-generation open-world games, employing a proprietary data pipeline for efficient sorting, scoring, filtering, and decoupled captioning. This robust and extensive OGameData forms the foundation of our model's training process. GameGen-O undergoes a two-stage training process, consisting of foundation model pretraining and instruction tuning. In the first phase, the model is pre-trained on the OGameData via the text-to-video and video continuation, endowing GameGen-O with the capability for open-domain video game generation. In the second phase, the pre-trained model is frozen, and we fine-tuned using a trainable InstructNet, which enables the production of subsequent frames based on multimodal structural instructions. This whole training process imparts the model with the ability to generate and interactively control content. In summary, GameGen-O represents a notable initial step forward in the realm of open-world video game generation via generative models. It underscores the potential of generative models to serve as an alternative to rendering techniques, which can efficiently combine creative generation with interactive capabilities.

AK

367,088 views • 1 year ago

OpenClaw meets RL! OpenClaw Agents adapt through memory files and skills, but the base model weights never actually change. OpenClaw-RL solves this! It wraps a self-hosted model as an OpenAI-compatible API, intercepts live conversations from OpenClaw, and trains the policy in the background using RL. The architecture is fully async. This means serving, reward scoring, and training all run in parallel. Once done, weights get hot-swapped after every batch while the agent keeps responding. Currently, it has two training modes: - Binary RL (GRPO): A process reward model scores each turn as good, bad, or neutral. That scalar reward drives policy updates via a PPO-style clipped objective. - On-Policy Distillation: When concrete corrections come in like "you should have checked that file first," it uses that feedback as a richer, directional training signal at the token level. When to use OpenClaw-RL? To be fair, a lot of agent behavior can already be improved through better memory and skill design. OpenClaw's existing skill ecosystem and community-built self-improvement skills handle a wide range of use cases without touching model weights at all. If the agent keeps forgetting preferences, that's a memory problem. And if it doesn't know how to handle a specific workflow, that's a skill problem. Both are solvable at the prompt and context layer. Where RL becomes interesting is when the failure pattern lives deeper in the model's reasoning itself. Things like consistently poor tool selection order, weak multi-step planning, or failing to interpret ambiguous instructions the way a specific user intends. Research on agentic RL (like ARTIST and Agent-R1) has shown that these behavioral patterns hit a ceiling with prompt-based approaches alone, especially in complex multi-turn tasks where the model needs to recover from tool failures or adapt its strategy mid-execution. That's the layer OpenClaw-RL targets, and it's a meaningful distinction from what OpenClaw offers. I have shared the repo in the replies!

OpenClaw meets RL! OpenClaw Agents adapt through memory files and skills, but the base model weights never actually change. OpenClaw-RL solves this! It wraps a self-hosted model as an OpenAI-compatible API, intercepts live conversations from OpenClaw, and trains the policy in the background using RL. The architecture is fully async. This means serving, reward scoring, and training all run in parallel. Once done, weights get hot-swapped after every batch while the agent keeps responding. Currently, it has two training modes: - Binary RL (GRPO): A process reward model scores each turn as good, bad, or neutral. That scalar reward drives policy updates via a PPO-style clipped objective. - On-Policy Distillation: When concrete corrections come in like "you should have checked that file first," it uses that feedback as a richer, directional training signal at the token level. When to use OpenClaw-RL? To be fair, a lot of agent behavior can already be improved through better memory and skill design. OpenClaw's existing skill ecosystem and community-built self-improvement skills handle a wide range of use cases without touching model weights at all. If the agent keeps forgetting preferences, that's a memory problem. And if it doesn't know how to handle a specific workflow, that's a skill problem. Both are solvable at the prompt and context layer. Where RL becomes interesting is when the failure pattern lives deeper in the model's reasoning itself. Things like consistently poor tool selection order, weak multi-step planning, or failing to interpret ambiguous instructions the way a specific user intends. Research on agentic RL (like ARTIST and Agent-R1) has shown that these behavioral patterns hit a ceiling with prompt-based approaches alone, especially in complex multi-turn tasks where the model needs to recover from tool failures or adapt its strategy mid-execution. That's the layer OpenClaw-RL targets, and it's a meaningful distinction from what OpenClaw offers. I have shared the repo in the replies!

Avi Chawla

138,691 views • 4 months ago